Visualization

Download as pdf or txt
Download as pdf or txt
You are on page 1of 75

Master in Biomedical Engineering

Grupo de Bioingeniería
y Telemedicina

Advanced Visualization of Health


Data
Gema García Sáez
[email protected]
Contents

 Introduction to visual data analytics

 Principles of Visualization

 Poor Visualizations

 Software Tools -Tutorials


Introduction
What is visual data analytics used for?

Reference: Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis; published onli
Feb 19. https://doi.org/10.1016/S1473-3099(20)30120-1
https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6
https://nssac.bii.virginia.edu/covid-19/dashboard/
Open data about cancer

http://gco.iarc.fr/
Definition – Visual analytics

Visual Analytics: science of analytical reasoning facilitated by interactive visual


interfaces [*Thomas, Cook, 2005]

Visual Analytics combines automated analysis techniques with interactive


visualizations for an effective understanding, reasoning and decision making on
the basis of very large and complex datasets [**Keim, 2008]

 Visualization of Big Data helps to the comprehension and communication of the data
 Better decision making
 Interactive plots help to explain ideas and relationships in the data

*Thomas, J., Cook, K.: Illuminating the Path: Research and Development Agenda for Visual Analytics. IEEE-Press (2005)
**D. Keim, G. Andrienko, et al. “Visual Analytics: Definition, Process, and Challenges”. A. Kerren et al. (Eds.): Information Visualization,
LNCS 4950, pp. 154–175, 2008. Springer-Verlag Berlin Heidelberg 2008
Introduction– Visual analytics (II)
 Emerged as a response to the ‘information overload’ problem
 Decision makers are bombarded with irrelevant information or inappropriate
data
 Allows the exploration of data:
 Visualization and theoretical models to generate new knowledge
 Result:
 Interactive Graphical interfaces
 Trends of data
 Changes in data over time
 Use:
 To communicate the message to non-expert users
Introduction– Visual analytics (III)
 Visual analytics focuses on:
 Analytical techniques allowing comprehensive insights and support in decision
making
 Interactive techniques and visual representations
 Techniques used for production, presentation and communication of the
analytical reasoning in a specific context for a targeted audience
 Transformations and representations of data to support analysis and
visualization
 Quantitative visualization (E.g. perform measurements directly on the visualization of
data)
 Different views of the data where different representations of the same data
can uncover different patterns
 Collaborative aspects
Visual Analytics Process

Raw data: Input

Thomas, J., Cook, K.: Illuminating the Path: Research and Development Agenda for Visual Analytics. IEEE-Press (2005)
Principles of Visualization
Framework to design visualization and evaluation
 Domain of application
 Characterize the problem and the available data. What? Who are the target users?
 Map Data and Tasks into abstract operations
 Translate from specifics of domain to vocabulary of visualization
 Data Abstraction: What data needs to be shown?
 It could be necessary to transform the data
 Task Abstraction: Why is the user looking at it?
 Interaction and Visual encoding: How is it shown?
 Interaction: how to manipulate the data? Domain
 Visual encoding idiom: how to draw? Data/ Task Abstraction

 Algorithmic Implementation Interaction and visual encoding


Algorithm
 Efficient computation
Munzner. A Nested Model of Visualization Design and Validation. IEEE TVCG 15(6):921-928, 2009 (Proc. InfoVis 2009).
Domain
Data/ Task Abstraction
Data Abstractions (I) Interaction and visual encoding
Algorithm

Data types:
 Attributes, Items, Links, positions, grids
Dataset types:
Spatial Data
Tables Networks Fields Geometry Clusters,
and Trees (Continuous) Lists
Attributes Grids
Attributes Items Items
(columns)
Positions
Items Items
Positions
(Rows) (nodes)

Links Attributes

Dataset availability:
 Static vs Dynamic
Data Abstractions (II)

Attribute types
 Categorical: ▲ ♥ ♦ ♠ ♣
 No explicit order, it could imply a hierarchy
 Ordered:
 Ordinal S, M, L, XL
 Quantitative

 Ordering direction:
 Sequential
 Diverging
 Cyclic
Domain
Data/ Task Abstraction
Task Abstractions – Interaction and visual encoding
Algorithm
Why analyze data?
 Associated Actions: Why is the user looking at it?
 Analyze:
High-level choices
Discover, present, derive,…
 Search information
Find a known/unknown item
 Query:
Find out about characteristics of an item
Identify, compare, summarize
Domain
Data/ Task Abstraction
Task Abstractions – Interaction and visual encoding
Algorithm
Why analyze data?
 Targets: Aspects of interest for the users
What data is being used?
 All the dataset:
 Trends, outliers, features
 Some attributes:
 Correlation, similarity, dependency,…

https://www.paho.org/data/index.php/en/mnu-topics/indicadores-
dengue-en/dengue-nacional-en/252-dengue-pais-ano-en.html?start=2
Domain
Data/ Task Abstraction
Methods to design visualizations – Interaction and visual encoding

Algorithm
How is data shown?
 How to build graphical representations to allow manipulation of
data:
 Encode
 Arrange or order data, Map channels (color, hue…)
 Manipulate
 Change, Select, Navigate
 Facets
 Juxtapose, partition, superimpose
 Reduce
 Filter, aggregate, embed
Domain
Data/ Task Abstraction
Encode Interaction and visual encoding

Algorithm

 One of the main aspects of design


 Visual encoding
 Define the combination of marks and visual channels
 Actions associated to the spatial place of the items
 Analyze items structure:

2: Channel: 3: Channel: 4: Channel:


1: Channel:
Vertical position Vertical position Vertical position
Vertical position
Horizontal position Horizontal position: Horizontal position:
color Color
Size (área)
Mark: Line Mark: Point Mark: Point
Mark: Point
2
Encode

How do we use the Area? Arrange the data


Express values

Separate

Order

Align
Scatterplot
 Express values
 Quantitative attributes
 Data
2 quantitative attributes
 Mark:
Point
 Channels
Horizontal + Vertical position
Tasks
Find trends, outliers,
distribution, correlation,
clusters
 Scalability
Hundreds of items A layered grammar of graphics. Wickham. Journ. Computational and Graphical Statistics 19:1 (2010), 3–28
Categorical Regions

Separate Order Align

 Regions: contiguous bounded areas distinct from each other


 Use space to separate (proximity)
 Use ordered attributes to order and align regions
 Types of graphics
 Bar chart
 Line chart
 Stacked bar chart
 Heatmap
Bar Chart vs Line Chart
the more male a person is,
the taller he or she is
 The choice depends on
the type of attributes
 Bar Chart:
 Categorical
 Line Chart:
 Ordered

 Not recommended to use


line charts for categorical
attributes
 Implication of trend so strong
that it overrides semantics
Bars and Lines: A Study of Graphic Communication. Zacks and Tversky. Memory and Cognition 27:6
(1999), 1073–1079
Heatmap
 Two dimensions
 Immediate summary of the information
 Data
 2 categorical attribs. (gene, experimental condition)
 1 quantitative attribute (expression levels)
 Marks: area
 Separate and align in 2D matrix
 Channels
 Color by quantitative attribute
 Task
 Find Clusters, outliers
 Scalability
 1M items, 100s of categorical levels, ~10 quantitative
attribute levels
Example: Heatmap
 Frequent use to visualize genetic information
 Data
 Rows: genes
 Columns: Samples, conditions

 Gradient of color, intensity levels


 Represents levels of genetic expression
 Red: genes up-regulated; Green: down-regulated
black: unchanged expression
 Example:
 Group genes according to similarity in genetic
expression patterns
 Combined with clustering analysis
 Aim: identify genes associated to a disease
36
Arrange spatial data

 Geometry
 Geographical data
 Other geometries derived from the data
 Spatial Fields
 Scalar fields
 One value per cell

 Vector and Tensor fields


 Many values per cell
Geographical data

 Famous map by John Snow in


1854
 Deaths caused by a cholera
outbreak in Soho, London
 Location of water pumps in the
área
 Indentified a significant clustering
of deaths around a certain pump
 Removing the handle of the
pump stopped the outbreak

https://blog.rtwilson.com/john-snows-famous-cholera-analysis-data-in-modern-gis-formats/
Geographical data
 Choropleth Maps
 Coloring scheme (different colors or
a graduated color scale) inside defined
areas on a map
 Aim: understand spatial relationships
 Data
 Geographic geometry
 Table with 1 quantitative attribute per
region
 Encoding
 Use given geometry for area mark
boundaries
 Sequential segmented color map
Geographical data
 Choropleth Maps
 Pros:
 Used to report area values at any scale, from global to local
 Helpful for finding intriguing hot spots, detecting relationships between the encoded
variable and geographic location (and the many variables entangled with location), or
letting people know how their area compares with others
 Cons:
 The viewer can not gain detailed information on any area’s internal conditions
 It can be solved by making the map interactive
 The areas are not uniform:
 Show the visual importance of each county with its geographic area rather than with the
number of people living in there, giving sparsely populated areas great visual emphasis
Arrange networks and trees

 Node-link diagrams:
 Trees and networks

 Adjacency matrix:
 Trees and networks

 Enclosure:
 Trees
Node-Link Diagrams:
Radial node-link tree
 Encoding
 Link connection marks
 Point node marks
 Radial axis orientation
 Angular proximity: siblings
 Distance from center: depth in
tree

 Tasks
 Understanding topology,
following paths
 Scalability
 1K - 10K nodes
Implements the Reingold-Tilford algorithm for efficient, tidy arrangement
of layered nodes. The depth of nodes is computed by distance from the
root, leading to a ragged appearance
http://mbostock.github.com/d3/ex/tree.html
Example radial
node-link tree
 Phylogenetic tree of the bacterial
domain
 Identify evolutionary relationships
among organisms or groups of
organisms
 Based on a concatenated alignment
of 31 broadly conserved protein-
coding genes

DY Wu et al. Nature 462, 1056-1060 (2009)


doi:10.1038/nature08656

46
Node-Link Diagrams: Force-directed
placement
Use of lines to represent links and to connect items
Visual encoding
Link: connection marks, Node: point marks
Spatial position: no meaning directly encoded
Free to minimize crossings
Proximity semantics?
Sometimes meaningful
Sometimes arbitrary
Tension with length
Long edges more visually salient than short http://mbostock.github.com/d3/ex/force.html

Aims: Explore topology; locate paths, clusters


Example

 Force directed graph


 Shows the protein
interaction network of a
bacteria (Treponema)
 Colored dots represent
different types of proteins
 Lines show the interactions

https://commons.wikimedia.org/wiki/File:The_protein_interaction_net
work_of_Treponema_pallidum.png
Adjacency matrix
 Data
 Network Henry, Fekete, and McGuffin. IEEE TVCG (Proc.
InfoVis) 13(6):1302-1309, 2007
 Transform into same data/encoding as NodeTrix: a Hybrid Visualization of Social Networks.
heatmap
 Derived data: table from network
 1 quantitative attribute
 Weighted edge between nodes
 2 categorical attributes: node list x 2
 Visual encoding
 Cell shows presence/absence of edge
 Use of triangular matrix with links with no
direction
 Scalability
Points of view: Networks. Gehlenborg and Wong. Nature Methods 9:115
 1K nodes, 1M edges
Example

 Protein – protein
interaction network
in hepatoma cells
 A: Network
 B,C: Curated host-
host protein
interactions from
the CORUM
database

https://www.researchgate.net/figure/A-Complete-HCV-Host-Protein-Protein-Interaction-Network-in-Hepatoma-Cells-A-
Network_fig2_271220492
"Elderly population in Europe ".
- Coloured rectangles represent the ratio of elderly
Enclosure: Treemaps people ("age group 65 and above") population
- The size of each rectangle in the Treemap
represents the "Total Population"

 Encoding
 Area containment
marks for hierarchical
structure
 Rectilinear orientation
 Size encodes
quantitative attributes
 Scalability
 1 M leaf nodes
 Example:
 Analyze statistical data
per country

http://ncva.itn.liu.se/education-geovisual-analytics/treemap?l=en
http://mitweb.itn.liu.se/GAV/dashboard/#story=data/nuts-regions-ageing-
population-in-europe-2010.xml&layout=[map,treemap]
Example: Treemap

 Fertility rates
according to the total
population of each
country

https://ncva.itn.liu.se/education-geovisual-analytics/treemap?l=enhttp://mitweb.itn.liu.se/GAV/dashboard/#story=data/nuts-regions-ageing-population-in-europe-
2010.xml&layout=[map,treemap]
Color: Luminance, Saturation, Hue

 A correct choice of the color is essential to perceive the data


 A poor election could be confusing and make the user
misunderstand the message
 3 channels:
 Identify categorical variables
 Hue Luminance
 Ordered variables
 Saturation, luminance
Saturation

Hue
Color in categorical vs ordered data

Seriously Colorful: Advanced Color Principles & Practices. Stone.Tableau


Customer Conference 2014
How to handle complexity?
 Derive new data to show in a different
view
 Manipulate
 Change the view over time
 Facet across multiple views
 Juxtapose
 Superimpose
 Partition
 Reduce items/attributes within single
view
 Filter
 Aggregate
 Embed
Domain
Data/ Task Abstraction
Manipulate the data Interaction and visual encoding

Algorithm

Changes over time


Selection of parameters
Allow navigation
Arrange:
Rearrange, reorder
Aggregation level
Interaction entails changes
Manipulate: select and highlight
 Selection
 Basic operation for most interaction tools
 Design choices
 How many selection types are possible?
 Clicks vs hover: heavyweight, lightweight
 Primary vs secondary selection: semantics (e.g. source/target)
 Highlight: change visual encoding for selection targets
 Color
 Other channels:
 E.g. Motion
 Add explicit connection marks between items
Example: Parallel coordinates
Linked Highlighting
Juxtapose and coordinate views
See how regions contiguous in one view
are distributed within another
Why juxtapose views?
Benefits: Eyes vs Memory
Lower cognitive load to move eyes
between 2 views than remembering
previous state with single changing view
Costs:
Display area Visual Exploration of Large Structured Datasets. Wills. Proc.
New Techniques and Trends in Statistics (NTTS), pp. 237–
2 views side by side each have only half the 246. IOS Press, 1995
area of one view
Partition into views
Partitioning and grouping are inverse terms
Partitioning: starting from the top and gradually refining (Top-
down)
Grouping: bottom-up process of gradually consolidating
How to divide data between views
Split into regions by attributes
Encode association between items using spatial proximity
Order of splits has major implications for what patterns are visible
Partition: List alignment
 Single bar chart with grouped bars split by state into regions: complex glyph within
each region showing all ages
 Compare: easy within state, hard across ages
 Small-multiple bar charts split by age into regions, one chart per region
 Compare: easy within age, harder across states
Domain

Reduce items and attributes


Data/ Task Abstraction
Interaction and visual encoding

Algorithm

 Filter
 Advantages: straightforward and intuitive to understand and compute
 Disadvantages: out of sight, out of mind

 Aggregation
 Advantages: inform about whole set
 Disadvantages: difficult to avoid losing signal

 Not mutually exclusive


 Combine filter, aggregate
 Combine reduce, change, facet
Example: dinamic filtering

 Browse through tightly coupled interaction

Visual information seeking: Tight coupling of dynamic query filters with starfield displays. Ahlberg and Shneiderman. Proc. ACM Conf. on Human
Factors in Computing Systems (CHI), pp. 313–317, 1994
Example Facets

 Titanic survival
Poor visualizations
Poor visualizations

 General rules to avoid:


 Use inappropriate data visualization techniques
 Use graphics of low quality
 Use pseudo-3D and color gratuitously
 Use pie charts (preferably in color and 3D)
 Use a poorly chosen scale
 Ignore significant figures
 Use inappropriate marks or background

How to display data badly. http://kbroman.org


H. Wainer: How to display data badly. American Statistician 38(2): 137–147, 1984
Poor visualizations - Avoid Pie Charts
 The eye is good at judging linear measures and bad at judging
relative areas

#1 #2
In #1, which one has bigger
percentage, C or D?

Who did better between


time 1 and time 2, candidate
B or candidate D?
Poor visualizations – Bar plots for paired data
 If we want to compare the
treatment between 2 groups and
the dataset is small…

How to show that there is


an increase after treatment?
How to improve comparisons?
How to improve comparisons?
Show as much data as possible

 Take care not to obscure the message


How to order data?
 Alphabetical order is recommended?

103
References

 Brehmer and Munzner “A Multi-Level Typology of Abstract Visualization Tasks”. IEEE Trans.
Visualization and Computer Graphics (Proc. InfoVis) 19:12 (2013), 2376–2385.
 T. Munzner. “Visualization Analysis and Design”. AK Peters Visualization Series, CRC Press, 2014
 Ware. “Information Visualization: Perception for Design”, 3rd edition. Morgan Kaufmann
/Academic Press, 2004
 Amar, Eagan, and Stasko. Low-Level Components of Analytic Activity in Information Visualization.
Proc. IEEE InfoVis 2005, p 111–117.
 Heer and Shneiderman. “A taxonomy of tools that support the fluent and flexible use of
visualizations”. Communications of the ACM 55:4 (2012), 45–54.
 Munzner. “A Nested Model of Visualization Design and Validation”. IEEE TVCG 15(6):921-928, 2009
(Proc. InfoVis 2009)
 Cookbook for R. Graphs with ggplot2. http://www.cookbook-r.com/Graphs/
 Aigner, Miksch, Schumann, and Tominski. “Visualization of Time-Oriented Data”. Springer, 2011
 Daniel Keim, Gennady Andrienko, Jean-Daniel Fekete, Carsten Görg, Jörn Kohlhammer, and
Guy Melançon. “Visual Analytics: Definition, Process, and Challenges”. A. Kerren et al. (Eds.):
Information Visualization, LNCS 4950, pp. 154–175, 2008. Springer-Verlag Berlin Heidelberg 2008
Practising with Rstudio &
Notebooks
Configure the working directory

 Check the current working directory:


 Getwd()
 If you need to change it:
 setwd(your_path)
 E.g. setwd(“C:/AIDM)
 Select
 Session -> Set working directory
 Choose a directory, or
 To source file location if you have opened a file in your path
Practicing with R and Notebooks
1. Download the 3 Tutorials stored in the Moodle
2. Open RStudio and load the 3 Notebooks,
following the order:
- Tutorial R MarkDown.Rmd
- Tutorial Dataframes.Rmd (use .csv file)
- Tutorial Plots.Rmd
3. Execute chunk code content using the
following options:
 Green execution button
 Knit to HTML
 Download the results of each tutorial executed
in an HTML file and upload them to the Moodle
 It can be delivered individually or in groups of 2-3
students
Answer the questionnaire

After executing the tutorials, answer the questionnaire in:


https://b.socrative.com/login/student/
Room name: AIDM2223

Enter your name to get a score


Visual analytics: Shiny
Interactive data visualization in R
 Multiple alternatives:
 plotly
 Graphs library, it allows interaction (e.g. zoom, selection,…)
 Library (plotly)
 Software Ggobi
 Open source, multi-dimensional data exploration
 library(rggobi)
 iPlots
 install.packages("iplots",dep=TRUE)
 ggVis
 Similar to ggplot2, allows use in a browser
 Shiny
Example Shiny Application
Example Shiny App – text analysis

https://www.rgonzo.us/shiny/apps/textanalysis/
Shiny
 Environment to allow creating interactive applications in R
 Allows online publishing of visualization tools (Web server)
 Used with the RStudio software
> install.packages("shiny")

 Structure of a Shiny application:


 Script for the user interface (ui.R):
 Defines the view and the interaction elements to allow data analytics
 Script for the server(Server.R):
 Defines the necessary instructions to execute the application

 The two components are executed together (dir_name) :


> library(shiny)
> runApp(dir_name)
Shiny
 Alternative structure:

 One script (App.R) which contains the code related to:


 User interface (equivalent to ui.R)
+
 Instructions to execute the app. (equivalent to Server.R):

library(shiny)
ui <- fluidPage(

)
server <- function(input, output, session) {

}
shinyApp(ui, server)

 The application is executed with App.R


Building interfaces

 ui.R scripts
 The function fluidPage() creates a display that automatically adjusts to the dimensions of the
user’s browser window
 The interface is laid out by by placing elements in the fluidPage() function
 Simple user-interface with a title panel and a sidebar layout, which includes a sidebar panel
and a main panel:
ui <- fluidPage(
titlePanel("title panel"),
Other options available:
sidebarLayout(
Format text: h1(“Title”)
sidebarPanel( "sidebar panel"),
Add widgets
mainPanel("main panel") Add Tab panels
) Add navigation bars
)
Adding reactivity

 server.R
 Contains the code to generate output results answering to the user’s interaction in the
User Interfaces
 Reactive output automatically responds when the user interacts with a widget
 Different types of functions are available:
 Ui.R  Output functions  Server.R  Render Functions
 htmlOutput -> rawHTML  renderUI  a Shiny tag object or HTML
 imageOutput -> image  renderImage  images (saved as a link to a source file)
 plotOutput -> plot  renderPlot  plots
 tableOutput -> table  renderTable  dataframe, matrix, other table like structures
 textOutput -> text  renderPrint  any printed output
 uiOutput -> rawHTML  renderText  character strings
Example basic Shiny app (I)

 Interface -> Ui.R


ui <- fluidPage(
h1("Example app"),
sidebarLayout(
sidebarPanel(
numericInput("nrows", "Number of rows", 10),
actionButton("save", "Save")
),
mainPanel(
plotOutput("plot"),
tableOutput("table")
)
)
)
Example basic Shiny app (II)

 Server -> server.R


server <- function(input, output, session) {
df <- reactive({
head(cars, input$nrows) Select the number of rows indicated by the user
})
output$plot <- renderPlot({
plot(df()) Draw a chart
})
output$table <- renderTable({
df() Update the table
})
# Use observeEvent to tell Shiny what action to take when input$save is clicked
observeEvent(input$save, {
write.csv(df(), "data.csv") Save data to a csv file
})
}
Examples basic Shiny applications

 “Hello Shiny” app

shiny::runExample("01_hello")

 The session will be busy while the “Hello


Shiny” app is active, so you will not be able
to run any R commands
 To get your R session back, hit escape or
click the stop sign icon (found in the upper
right corner of the RStudio console panel)
Execute and analyze other examples
 runExample("01_hello")

 runExample("02_text") # Tables and data frames

 runExample("03_reactivity") # use of interactive components

 runExample("04_mpg") # Use of global variables

 runExample("05_sliders") # use of sliding bars to select ranges

 runExample("06_tabsets") # use of tabs

 runExample("07_widgets") # use of widgets

 Other examples:
 http://Shiny.rstudio.com/gallery/
 https://shiny.rstudio.com/tutorial/
 https://shiny.rstudio.com/gallery/hospital-data-antimicrobial.html

You might also like