Preview-9780231550154 A42427036

BETTER DATA
VISUALIZATIONS

COLUMBIA UNIVERSITY PRESS ր ր NEW YORK
Columbia University Press
Publishers Since 1893
New York Chichester, West Sussex
cup.columbia.edu
Copyright © 2021 Columbia University Press
All rights reserved
Chapter 11, “Tables,” based on Jonathan A. Schwabish, “Ten Guidelines for Better Tables,”
Journal of Benefit-Cost Analysis 11, no. 2 (2020): 151–178. Reprinted with permission.
Library of Congress Cataloging-in-Publication Data

Names: Schwabish, Jonathan A., author.
Title: Better data visualizations : a guide for scholars, researchers, and wonks /
Jonathan Schwabish.
Description: New York : Columbia University Press, [2021] | Includes bibliographical
references and index.
Identifiers: LCCN 2020017814 (print) | LCCN 2020017815 (ebook) | ISBN 9780231193108
(hardback) | ISBN 9780231193115 (trade paperback) | ISBN 9780231550154 (ebook)
Subjects: LCSH: Information visualization. | Visual analytics.
Classification: LCC QA76.9.I52 S393 2021 (print) | LCC QA76.9.I52 (ebook) |
DDC 001.4/226—dc23
LC record available at https://lccn.loc.gov/2020017814
LC ebook record available at https://lccn.loc.gov/2020017815
Columbia University Press books are printed on permanent and

durable acid-free paper.
Printed in the United States of America

For Aunt Vivi. Our Mendales. With love and Diet Coke.

CONTENTS
INTRODUCTION 1
PART ONE: PRINCIPLES OF DATA VISUALIZATION

1. VISUAL PROCESSING AND PERCEPTUAL RANKINGS 13
Anscombe’s Quartet 20
Gestalt Principles of Visual Perception 22
Preattentive Processing 25
2. FIVE GUIDELINES FOR BETTER DATA VISUALIZATIONS 29

Guideline 1: Show the Data 29
Guideline 2: Reduce the Clutter 31
Guideline 3: Integrate the Graphics and Text 33
Guideline 4. Avoid the Spaghetti Chart 41
Guideline 5. Start with Gray 43
3. FORM AND FUNCTION: LET YOUR AUDIENCE’S NEEDS DRIVE

YOUR DATA VISUALIZATION CHOICES 53
Changing How We Interact with Data 61
Let’s Get Started 62
viiiփ CONTENTS
PA RT TWO: C HART T Y PES

4. COMPARING CATEGORIES 67
Bar Charts 68
Paired Bar 84
Stacked Bar 87
Diverging Bar 92
Dot Plot 97
Marimekko and Mosaic Charts 102
Unit, Isotype, and Waffle Charts 106
Heatmap 112
Gauge and Bullet Charts 118
Bubble Comparison and Nested Bubbles 121
Sankey Diagram 126
Waterfall Chart 129
Conclusion 130
5. TIME 133
Line Chart 133
Circular Line Chart 149
Slope Chart 150
Sparklines 152
Bump Chart 153
Cycle Chart 155
Area Chart 157
Stacked Area Chart 159
Streamgraph 162
Horizon Chart 164
Gantt Chart 166
Flow Charts and Timelines 170
Connected Scatterplot 175
Conclusion 177
6. DISTRIBUTION 179
Histogram 179
CONTENTSփ ix
Pyramid Chart 185

Visualizing Statistical Uncertainty with Charts 187
Box-and-Whisker Plot 196
Candlestick Chart 199
Violin Chart 200
Ridgeline Plot 201
Visualizing Uncertainty by Showing the Data 204
Stem-and-Leaf Plot 214
Conclusion 215
7. GEOSPATIAL 217
Choropleth Map 220
Cartogram 233
Proportional Symbol and Dot Density Maps 243
Flow Map 245
Conclusion 248
8. RELATIONSHIP 249
Scatterplot 249
Parallel Coordinates Plot 263
Radar Charts 267
Chord Diagram 269
Arc Chart 272
Correlation Matrix 275
Network Diagrams 277
Tree Diagrams 284
Conclusion 287
9. PARTҺTOҺHOLE 289
Pie Charts 289
Treemap 297
Sunburst Diagram 299
Nightingale Chart 300
Voronoi Diagram 304
Conclusion 309
xփ CONTENTS
10. QUALITATIVE 311

Icons 311
Word Clouds and Specific Words 312
Word Trees 316
Specific Words 318
Quotes 319
Coloring Phrases 321
Matrices and Lists 324
Conclusion 325
11. TABLES 327

The Ten Guidelines of Better Tables 329
Demonstration: A Basic Data Table Redesign 338
Demonstration: A Regression Table Redesign 341
Conclusion 344
PART THREE: DESIGNING AND REDESIGNING

YOUR VISUAL
12. DEVELOPING A DATA VISUALIZATION STYLE GUIDE 349
The Anatomy of a Graph 352
Color Palettes 358
Defining Fonts for the Style Guide 362
Guidance for Specific Graph Types 364
Exporting Images 365
Accessibility, Diversity, and Inclusion 366
Putting it All Together 368
13. REDESIGNS 369

Paired Bar Chart: Acreage for Major Field Crops 369
Stacked Bar Chart: Service Delivery 372
Line Chart: The Social Security Trustees 374
Choropleth Map: Alabama Slavery and Senate Elections 378
Dot Plot: The National School Lunch Program 380
Dot Plot: GDP Growth in the United States 382
CONTENTSփ xi
Line Chart: Net Government Borrowing 385

Table: Firm Engagement 387
Conclusion 389
CONCLUSION 391
APPENDIX 1: DATA VISUALIZATION TOOLS 397
APPENDIX 2: FURTHER READING AND RESOURCES 403

General Data Visualization Books 403
Historical Data Visualization Books 405
Books on Data Visualization Tools 405
Data Visualization Libraries 406
Where to Practice 407
Acknowledgments 409
References 413
Index 431
BETTER DATA
VISUALIZATION
INTRODUCTION
R
aise your hand if your approach to creating a graph goes something like this: You ana-
lyze some data. Write up the results. Make a graph and drop it into the report, sur-
rounded by text. Label it something benign like “Figure 1. Average Earnings, 1990–2020.”
Save it as a PDF. Post it to the world.
It might have taken you months or even years to compile and analyze the data and write
the report. For many, it takes far less time to design the graphs that showcase that data. You
might open a program like Microsoft Excel, paste in the data, click through the drop-down
menu, select one you’ve used dozens or hundreds of times, accept the default formatting,
and paste it into the report.
But at any point in this sequence did you pause to consider what’s most important about
communicating the work? It’s the audience. People will read your report. People will listen to
you discuss your work. And yet many of us spend far too little time thinking about how we can
best present our findings. Instead we use whatever default approach is quickest and easiest.
Why is this? Maybe you don’t believe you have the technical skills or design know-how
to create complex, attractive graphs. Or you worry it’s not worth the effort, because your
managers or tenure committee or whoever else won’t see it as time well spent. Many people
simply think that their reader will just “get it,” as if everyone has seen the content a hundred
times before. But many readers, especially those who can make change or implement policy,
may have never seen this content before. In these cases—which are probably most of them—
thinking carefully about how data is presented is just as important as the data itself.
This book is about how to create better, more effective visualizations of your data. It aims
to expand your graphic literacy and put more graphs in your toolbox. The next time you open
2փ INTRODUCTION
Excel, Tableau, R, or whatever your software tool of choice, you won’t be bound by the graphs
in the dropdown menus or the tutorial manual. This book will guide you to choose the graph
that is the best fit for your data and will most effectively communicate your message.
People often tell me they can’t create some of these different, nonstandard graphs because
their colleague or manager or audience won’t understand them. We are not born knowing
instinctively how to read a bar chart or line chart or pie chart. As Scott Klein, deputy manag-
ing editor at ProPublica once wrote, “There is no such thing as an innately intuitive graphic.
None of us are born literate in reading visualizations.”
As data visualization creators, we must understand our audience and know when a differ-
ent graph can engage readers—and help them expand their own graphic literacy.

This book has three main parts. Part 1 covers general guidelines to creating effective visu-
alizations. We’ll learn the importance of our audience and how to consider what category
of graph will best meet their needs. No data visualization book will contain every lesson to
create effective graphs, but there are some best practices that can guide your work. As you go
Each of these six charts visualizes the same data: The share of people earning minimum
wage or less in each state.
INTRODUCTIONփ 3
forward creating more visuals and seeing their effect on your audience, you’ll develop your
own aesthetic and learn when to bend or break these guidelines.
Part 2 is the meat of the book. We will define and discuss more than eighty graphs, cat-
egorized into eight broad categories: Comparisons, Time, Distribution, Geospatial, Rela-
tionship, Part-to-Whole, Qualitative, and Tables. We will see how each graph works and the
advantages and disadvantages of each.
Graphs overlap between these categories—a bar chart, for example, can be used to show
changes over time or comparisons between groups. The categorizations here are based on
a graph’s primary purpose. But even that’s not an objective truth, and your perspective and
situation may differ. I do not discuss every single possible graph—there are many specialized
graphs in fields like architecture, biology, and engineering that are excluded here. Instead,
these chapters cover the most common and flexible graphs that can showcase the sorts of
data most people will need to display.
I tie these chapters together in part 3 with a chapter on building a data visualization style guide
and a chapter on how to pull the different lessons together in a series of graph redesigns. If you’ve
ever written a research paper, or even a book report, you are probably aware of the array of writing
style guides, from the Chicago Manual of Style to the Modern Language Association. These guides
break down writing into component parts and prescribe their proper use. A data visualization style
guide does the same for graphs—defines their parts and how to style and use them. In the final
chapter, we apply the lessons to redesign a series of graphs to improve how they communicate data.
This book will guide you as you explore your data and how it might be visualized. Now
more than ever, content must be visual if it is to travel far. Your clients and colleagues, and
your audiences of policymakers, decisionmakers, and interested readers are inundated with
a flow of information. Visuals cut through that.
Anyone can improve the way they visualize and communicate their data—and you don’t
need a graduate degree in marketing or design or advertising. Take it from me, I started my
career as an economist in the federal government.
HOW I LEARNED TO VISUALIZE MY DATA
Once I settled on declaring my economics major at the University of Wisconsin at Madison

(there was an ill-fated attempt to also be a math major, but I hit a wall at Markov chains),
I knew I wanted to end up in Washington, DC. I wanted to be near the center of public
policy and politics. I wanted to explore the real problems of the day and help craft solutions.
4փ INTRODUCTION
I moved to DC in 2005 to join the Congressional Budget Office (CBO). My job was to
help work on the long-term microsimulation model that is used to examine the Social Secu-
rity system and forecast the long-term finances of the federal budget. The spring of 2005 was
an exciting time to work on Social Security—President George W. Bush had made Social
Security a central component of his second term. In his 2005 State of the Union address, he
said, “We must pass reforms that solve the financial problems of Social Security once and for
all.” Reform would stall later that year, but in the course of my first few months on the job,
my group at CBO estimated and analyzed the effects of dozens of policy proposals.
Five years later, I had expanded my work to include issues around policies that affected
disabled workers, immigration, and food stamps (now called the Supplemental Nutrition
Assistance Program or SNAP). In 2010, three of my colleagues were drafting a special report
on policy options for Social Security. In it, they would show the impact of thirty different
options for reform. One of the central figures in the report would show changes in taxes
received by the system, benefits paid out from the system, the balance between the two, and
other measures of fiscal solvency for these thirty options. It looked something like this:
Author’s rendering of early draft of exhibit from the Congressional Budget Office.
INTRODUCTIONփ 5
You don’t need to be a government economist to know that members of Congress are
unlikely to read something that looks like a spreadsheet. There are too many rows, too many
columns, too many numbers—too much information. It was right then that I first started
thinking about better ways to present this information.
This was the result. We replaced some numbers with small area charts, which give the
reader an immediate visual impression of each option—which ones increased the solvency
of the program and which ones did not.
Final version of that main exhibit in the Congressional Budget Office report on Social
Security. Notice that there is less data and more graphs.
Source: Congressional Budget Office.
The report worked. We received good feedback from colleagues at CBO and other agen-
cies, as well as readers on Capitol Hill and elsewhere, noting how easy it was to read and
digest the graphs. It was maybe the first time I (and perhaps the agency) thought carefully
6փ INTRODUCTION
and strategically about our data visuals. From there, I started reading books on data visual-
ization, design, color theory, and typography.
Working with our editorial department and designers, we began to improve the
graphs in our basic reports and started creating new report and graph types. We made
infographics—what was then a buzzword referring (sometimes derisively) to longer
graphics that combine data, text, images, and more into a single visual. In 2012, we cre-
ated this infographic to accompany and summarize The Long-Term Budget Outlook, a
109-page report.
One-page infographic about the 2012 Long-Term Budget Outlook from the Congressional
Budget Office.
Source: Congressional Budget Office.
INTRODUCTIONփ 7
That June, CBO’s director sat in front the U.S. House Budget Committee to relay the
results of our analysis. As the hearing played on a TV out in the hallway, I suddenly heard
yells of, “Jon! Jon! Come out! Your infographic is on TV!”
And, sure enough, Congressman Chris Van Hollen was holding up the infographic on
C-SPAN, covered with scribbles and notes. The visualization had captured and engaged the
attention of one of the busiest people in America, and someone who could do something
about the pressures facing the federal budget. That was the moment I knew that how we
presented our data could matter as much as the data itself.
In 2014, I moved to the Urban Institute, a nonprofit research institution in Washington,
DC, to spend half of my time conducting research and half of my time in the Communica-
tions department, helping colleagues present and visualize their data.
Since that time, I have conducted hundreds of workshops, delivered lectures around the
globe, and published two books on data communication. The world, it seemed, had seen
what I saw—better visual content and better presentations were the currency of research and
Maryland Congressman Chris Van Hollen holding up that Long-Term Budget Outlook info-
graphic in a House Budget Committee hearing.
Source: C-SPAN2.
8փ INTRODUCTION
policy adoption. The advance of computing power, social media platforms, and the expand-
ing media landscape made visual content more important, perhaps even necessary.
Today, I work with people in nonprofits, government agencies, private sector companies,
and everything in between to improve how they create their graphs and communicate their
content. I’ve worked with junior economists and analysts dealing with enormous data sets;
health care workers trying to communicate results to patients, families, and hospital admin-
istrators; human resource representatives working with databases of job-seekers; advertisers
and marketing executives selling products to clients; and many more.
I’ve seen hundreds of different kinds of data visualization challenges. The skills to meet
them, unfortunately, are not yet regularly taught in schools or professional development
programs. But these skills can be learned. We can learn how to read chart types we’ve never
seen before, even if they are complex. And we can learn how to communicate our work in
better and more effective ways.
Eventually, I discovered that one of the most important things I can show people is the
incredibly wide array of graphs available to them. And that is precisely the content of this
book, a survey of more than eighty types of data visualizations, from the familiar to the
nonstandard.
But before we get to the library of graph types, we’ll consider some of the science behind
how we process visual information and some best practices and approaches to visualizing data.
PART ONE
PRINCIPLES OF DATA VISUALIZATION
VISUAL PROCESSING AND
PERCEPTUAL RANKINGS
1
B
efore we start creating our charts and graphs, we need to cover some basic theory of
how the brain perceives visual stimuli. This will guide you as you decide what chart type
is most appropriate to visualize your data.
When we consider how to visualize our data, we must ask ourselves how accurately the
reader can perceive the data values. Are some graphs better equipped to guide the reader to
the specific difference between, say, 2 percent and 2.3 percent? If so, how should we think
about those differences as we create our visualizations?
There’s a thread of research in the data visualization field that explores this very ques-
tion. Based on original research over the past thirty years or so, the image on the next
page shows a spectrum of graphs—or more generally, types of data encodings like dots,
lines, and bars—arrayed by how easily readers can estimate their value. The encodings that
readers can most accurately estimate are arranged at the top, and those that enable more
general estimates are at the bottom.
The rankings are unsurprising. It is easier to compare the data in line charts, bar charts,
and area charts that have the same axis or baseline. Graphs on which the data are positioned
on unaligned axes—think of a pair of bars that are offset from one another on different
axes—are slightly harder for us to accurately discern the values.
Farther down the vertical axis are encodings based on angle, area, volume, and color.
You intuitively know this: it’s much easier to discern the exact data values and differences
between values when reading a bar chart than when reading a map where countries are
shaded with different colors.
ovbঞom-Ѵom]1ollomv1-Ѵ;v
Enable ovbঞom-Ѵom]b7;mঞ1-Ѵķmom-Ѵb]m;7v1-Ѵ;v
accurate
;vঞl-|;v
Length
bu;1ঞomņvѴor; Angle Parts of a whole
Area
(oѴl;
"_-7bm]-m7v-|u-ঞom
May enable
general Color hue
;vঞl-|;v
Perceptual ranking diagram. What kind of data visualization you choose to create will
depend on your goals and your audience’s needs, experiences, and expertise. This image is
based on Alberto Cairo (2016) from research by Cleveland and McGill (1984), Heer,
Bostock, and Ogievetsky (2010), and others.
VISUAL PROCESSING AND PERCEPTUAL RANKINGSփ 15
Standard graphs, like bar and line charts, are so common because they are perceptually
more accurate, familiar to people, and easy to create. Nonstandard graphs—those that use
circles or curves, for instance—may not allow the reader to most accurately perceive the
exact data values.
But perceptual accuracy is not always the goal. And sometimes it’s not a goal at all.
Spurring readers to engage with a graph is sometimes just as important. Sometimes, it’s
more important. And nonstandard chart types may do just that. In some cases, nonstandard
graphs may help show underlying patterns and trends in better ways that standard graphs.
In other cases, the fact that these nonstandard graphs are different may make them more
engaging, which we may sometimes need to first attract attention to the visualization.
This graphic from information designer Federica Fragapane shows the fifty most vio-
lent cities in the world in 2017. The vertical axis measures the population of each city and
the horizontal axis captures the homicide rate per 100,000 people. The number of lines in
each icon represents the number of homicides, and additional colors, shapes, and markers
capture metrics like country of origin (the symbol in the middle of each), region (vertical
dashed line), and change since 2016 (blue for decreases, red for increases). It could be a bar
Graphic from Frederica Fragapane for La Lettura—Correier della Serra that shows the fifty
most violent cities in the world. See the next page for a closer look at the legend.
A zoom-in of the graphic from Frederica Fragapane. Notice all of the details and data ele-
ments included in each icon. It could be a bar chart or line chart, but would you then be
inclined to zoom in and read it closely?
chart or a line chart or some other chart type. But if it were, would you be inclined to zoom
in, read it closely, and examine it?
Data visualization is a mix of science and art. Sometimes we want to be closer to the sci-
ence side of the spectrum—in other words, use visualizations that allow readers to more
accurately perceive the absolute values of data and make comparisons. Other times we may
want to be closer to the art side of the spectrum and create visuals that engage and excite the
reader, even if they do not permit the most accurate comparisons.
Sometimes you must make your visuals interesting and engaging, even at the cost of
absolute perceptual accuracy. Readers may not be as interested in the topic as we hope or
may not have enough expertise to immediately grasp the content. As content creators, how-
ever, our job is to encourage people to read and use the graph, even if we “violate” perceptual
rules that we know will hamper someone’s ability to make the most accurate conclusions.
Thinking about different audience types is not just about considering among decision mak-
ers, scholars, policymakers, and the general public—it also means thinking about different
levels of interest or engagement with the visual itself. As historian Cecelia Watson writes in
her book about the history and use of the semicolon, “What if we thought less about rules
and more about communication, and considered it our obligation to one another to try to
figure out what is really being communicated?”
We should not operate from the assumption that readers will pay attention to everything
in our visual, even if we use a common, familiar chart type. Let’s be honest: People see bar
charts and line charts and pie charts all the time, and those charts are often boring. Boring
graphs are forgettable. Different shapes and uncommon forms that move beyond the bor-
ders of our typical data visualization experience can draw readers in. Reading a graph is not
like the spontaneous comprehension of seeing a photograph. Instead, reading a graph has
more of the complex cognitive processes as reading a paragraph.
This isn’t to say we should not concern ourselves with visual perception or allowing our
readers to make the most accurate comparisons, but the goal of engagement can be worth a
lot in its own right. Elijah Meeks, a data visualization engineer, wrote that, “Charts, like any
other communication, need to be compelling to be convincing, and if your bar chart, as opti-
mal as it may be, has been reduced to background noise by the constant hum of bar charts
crossing a stakeholder’s screen, then it’s your responsibility to make it more compelling, even
if it’s not any more precise or accurate than a more simple form.”
Introducing a new or different graph type can also introduce a hurdle to your reader.
These can be big hurdles, like a completely new graph type or an exceptionally unusual
18 փ PRINCIPLES OF DATA VISUA LIZATION
This graphic from an interactive visualization from the Organisation of Economic

Co-Operation and Development (OECD) enables users to explore the different metrics
and definitions of what it means to have a “better life.” A more standard chart type, like a
bar chart, might enable easier comparisons, but would it be as much fun?
Source: Organisation for Economic Co-Operation and Development
representation of the data. Or they can be small hurdles, graphs that rank lower on the
perceptual-accuracy scale or graphs that people may have only seen a few times before.
To overcome these hurdles, you may need to explain how to read the graph. But that
might be worth it because sometimes different charts attract reader’s attention and pique
their curiosity.
When should you use a nonstandard graph? Likely not for many scholarly purposes,
because they do not enable the most accurate perceptions of the data. For scholarly writing,
accuracy is paramount. We want our reader to clearly and efficiently compare the values we’re
presenting. But in other cases—headline-style or standalone graphics, blog posts, shorter
briefs or reports, or graphs for social media—creating something different may draw people
in and hold their attention just long enough to convey your argument, data, or content.
This visualization from artist and journalist Jaime Serra Palou is a lovely example of
this kind of nonstandard and creative data visualization. He plots his coffee consump-
tion every day over the course of a year by using the stains from his coffee cups. You can
immediately see those parts of the year when he needed an extra burst of caffeine. Yes, a
line chart might convey the same data, but would you pause to spend an extra moment
reading it?
Sometimes you can do both—a nonstandard, attention-grabbing graphic accompanied
by a more familiar graph next to it. What you present and how you present it depends
on your audience. The Serra piece might work as the lead graphic on a book or report
about coffee consumption, but more detailed charts inside might take the form of standard
charts and tables. Some academic research has shown that creating novel graphs, such as
Artist and journalist Jaime Serra Palou plotted his coffee consumption every day for a year
by using stains from his coffee cup.
20փ PRINCIPLES OF DATA VISUA LIZATION
those that enable the user to personalize the content (by inputting their own information) or
are simply more aesthetically appealing, encourages readers to actively process the content.
ANSCOMBE’S QUARTET
The value of visualizing data is best illustrated by Anscombe’s Quartet, published in 1973 by
statistician Francis Anscombe. The Quartet demonstrates the power of graphs and how they,
together with statistical calculations, can better communicate our data.
Examine the table below, which shows four pairs of data, an X and a Y.
We can make some basic observations about these data. We can see that the first three
series of X’s are all the same; the values of X’s in the last series are all 8 except for the one 19;
and the X’s are all whole numbers while the Y’s are not. We might even notice that the 12.7
value in the third column of Y is larger than the rest. In my experience, most people don’t
comment about the relationship between the different series, which, at the end of the day, is
what we want to understand. It turns out that each of the four pairs yield the same standard
information: the same average values of the X series and the Y series; the same variance for
each; the same correlation between X and Y; and the same estimated regression equation.
Data set 1 1 2 2 3 3 4 4
Variable x y x y x y x y
Obs. No. 1 : 10 8.0 10 9.1 10 7.5 8 6.6

2 : 8 7.0 8 8.1 8 6.8 8 5.8
3 : 13 7.6 13 8.7 13 12.7 8 7.7
4 : 9 8.8 9 8.8 9 7.1 8 8.8
5 : 11 8.3 11 9.3 11 7.8 8 8.5
6 : 14 10.0 14 8.1 14 8.8 8 7.0
7 : 6 7.2 6 6.1 6 6.1 8 5.3
8 : 4 4.3 4 3.1 4 5.4 19 12.5
9 : 12 10.8 12 9.1 12 8.2 8 5.6
10 : 7 4.8 7 7.3 7 6.4 8 7.9
11 : 5 5.7 5 4.7 5 5.7 8 6.9
Mean 9.0 7.5 9.0 7.5 9.0 7.5 9.0 7.5

Variance 11.0 4.1 11.0 4.1 11.0 4.1 11.0 4.1
Correlation 0.816 0.816 0.816 0.817
Regression line y = 3 + 0.5x y = 3 + 0.5x y = 3 + 0.5x y = 3 + 0.5x
Source: Francis Anscombe
Known as Anscombe’s Quartet, this example demonstrates how difficult it is for us to pull
out basic patterns and summary statistics.
When we see the same data presented in four graphs, however, we can immediately see
these relationships, for example, the positive correlation in all four pairs, the curvature in the
second pair that you couldn’t see in the table, and the outliers 12.7 and 19.0.
We are much more likely to remember these four small graphs than we are the origi-
nal table. In his bestselling book, Brain Rules, molecular biologist John Medina writes,
“The more visual the input becomes, the more likely it is to be recognized and recalled.”
The more we can make our data and content visual, the more we can expect our readers to
remember it and, hopefully, use it.
15 15
10 10
5 5
0 0
0 5 10 15 20 0 5 10 15 20
15 15
10 10
5 5
0 0
0 5 10 15 20 0 5 10 15 20
The data visualization representation of Anscombe’s Quartet. Notice how much easier
it is to see the positive relationship between the two variables, the curvature in the
pattern in the top-right graph, and the outliers in the bottom two graphs.
Source: Francis Anscombe (1973).
22 PRINCIPLES OF DATA VISUA LIZATION
GESTALT PRINCIPLES OF VISUAL PERCEPTION
How do we perceive information? And how, as chart creators, can we use these perceptual
rules to more effectively communicate our data? “Gestalt theory” is one such way we can
think about how our readers will look at our graphs. Gestalt theory was developed in the
early part of the twentieth century by German psychologists and refers to how we tend to
organize visual elements into groups. Further developments in the field were interrupted by
the rise of the Nazi regime in Germany and then by World War II, and after the war it was
criticized for not having rigorous methodological methods. But the ideas persist in many
disciplines, including information theory, vision science, and cognitive neuroscience.
These six organizational principles from Gestalt theory are especially useful for creating
graphs and visuals that tap into our reader’s visual processing network.
PROXIMITY
We perceive objects that are close to one another as belonging to a group. There are lots of
graphical elements that we can group together: labels with points, bars with each other, or,
like this graph, clusters of points in a scatterplot in which we can see two groups or clusters,
one in the top-right and the other closer to the bottom-left.
VISUAL PROCESSING AND PERCEPTUAL RANKINGS 23
SIMILARITY
Our brains group objects that share the same color, shape, or direction. Adding color to
the above scatterplot reinforces the two groups.
ENCLOSURE
Bounded objects are perceived as a group. Here, in addition to using color, we can enclose
the two groups with circles or other shapes.
24փ PRINCIPLES OF DATA VISUA LIZATION
CLOSURE
Our brains tend to ignore gaps and complete structures with open areas. In its basic form,
we don’t have a problem viewing a simple graph that has a horizontal axis and a vertical axis as
a single object because the two lines are enough for us to define the closed space. In a line chart
with missing data, for example, we tend to mentally close the gap in the most direct way pos-
sible, even if there might be something different going on in that missing area. For example, in
the line graph on the left, we mentally close the gap between the two segments with a straight
line even though the missing data might yield a pattern that moves up and then down.
CONTINUITY
Here, objects that are aligned together or continue one another are perceived as a group.
Hence, our eyes seek a smooth path when following a sequence of shapes. You don’t need
the horizontal axis line in this bar chart, for example, because the bars are aligned along a
consistent path between the labels and the bottoms of the bars.
A B C D E
CONNECTION
According to this principle, we perceive connected objects as members of the same group.
Take this series of dots: At first, we perceive it as a single series, a mass of blue dots. Adding
color makes it clear there are two different series. Connecting the dots makes it clear how the
two initially track each other but then diverge.
PREATTENTIVE PROCESSING
The concept of “preattentive processing” is a subset of Gestalt theory, and it is the visual pro-
cess I consider most when creating my data visualizations. As we just saw, because our eyes
can detect a limited set of visual characteristics, we combine various features of an object
and unconsciously perceive them as a single image. In other words, preattentive attributes
draw our attention to a specific part of an image or, in our case, a graph.
For example, try to find the four largest numbers in this table.
Table 1. Our sales grew to $600 million this year

Q1 Q2 Q3 Q4
Bob 26 35 72 84
Ellie 22 15 61 35
Gerrie 19 20 71 55
Jack 22 95 13 64
Jon 83 62 46 48
Karen 30 65 98 82
Ken 38 28 45 71
Lauren 98 81 41 63
Steve 16 50 23 41
Valerie 46 24 30 57
Total $400 $475 $500 $600
Hard to do, right? Now try it with these versions that use color (on the left) and intensity
(on the right) to highlight those four numbers.
Table 1. Our sales grew to $600 million this year Table 1. Our sales grew to $600 million this year
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
Bob 26 35 72 84 Bob 26 35 72 84
Ellie 22 15 61 35 Ellie 22 15 61 35
Gerrie 19 20 71 55 Gerrie 19 20 71 55
Jack 22 95 13 64 Jack 22 95 13 64
Jon 83 62 46 48 Jon 83 62 46 48
Karen 30 65 98 82 Karen 30 65 98 82
Ken 38 28 45 71 Ken 38 28 45 71
Lauren 98 81 41 63 Lauren 98 81 41 63
Steve 16 50 23 41 Steve 16 50 23 41
Valerie 46 24 30 57 Valerie 46 24 30 57
Total $400 $475 $500 $600 Total $400 $475 $500 $600
Preattentive attributes here direct our attention to the large numbers immediately.
It’s easier to find the numbers in these two tables than the first because the numbers are
encoded using preattentive attributes: color and weight. Each distinction helps us effortlessly
identify the key number.
Shape Enclosure Line Width Saturation Color
Size Markings Orientation Position 3D
Length Curvature Density Closure Sharpness
Examples of preattentive attributes that we can use in our visualizations to direct our
reader’s attention.
Preattentive attributes are effects that seem to pop out from their surroundings. There are
many we can use to tap into our reader’s visual processing network to draw their attention:
shape, line width, color, position, length, and more.
Preattentive processing works in photographs too. Consider these images of fruits and
vegetables. In the photo on the left, the eye is drawn to the upper-right corner. The group of
tomatoes is slightly larger than the rest and positioned away from the group. In the photo-
graph on the right, however, the eye is not drawn to any specific position. This photograph
is more evenly balanced, so no one object stands apart from the rest.
Notice how your eye gravitates toward the four tomatoes in the top-right part
of the image on the left. The image on the right is balanced, so your eye doesn’t
immediately focus on any particular area. Photos by NordWood Themes (left) and Tim
Gouw (right) on Unsplash.
We can apply these attributes to data visualization. A line chart uses the position
of the points to indicate the data, while a bar chart uses length. You can use preatten-
tive attributes to draw your audience’s attention to aspects of your graphs, guiding
their focus.
For example, on the next page, I can enclose the ‘Forecast’ area of the line chart on the left
with the gray box—notice how it immediately draws your eye to the right side of the graph.
Similarly, I can use the color attribute to highlight a few points in the scatterplot on the right
(and keep the other dots gray).
US Real GDP growth is projected to decline and stabilize around 1.7%

3.5
3.0
2.5
2.0
1.5
1.0
0.5
Actual Forecast
0.0
2010 2012 2014 2016 2018 2020 2022 2024
^ŽƵƌĐĞ͗ŽŶŐƌĞƐƐŝŽŶĂůƵĚŐĞƚKĸĐĞ
Applying simple preattentive attributes to these graphs directs your eye to the “Forecast”
area of the graph on the left and to the two highlighted countries in the graph on the right.
WRAPPING UP
With these basic rules of perception, we are now better equipped to recognize and interpret
the visual features we can use to encode and highlight our data. Before we start adding more
graphs to our data visualization toolbox, let’s lay out some basic guidelines of more effec-
tive data visualizations—things you should keep in mind no matter what kind of graph you
are creating.
FIVE GUIDELINES FOR BETTER
DATA VISUALIZATIONS
2
W
henever I create a data visualization, whether it’s static, interactive, or part of a
report or blog post or even a tweet, I follow five primary guidelines.
1. Show the data

2. Reduce the clutter
3. Integrate the graphics and text
4. Avoid the spaghetti chart
5. Start with gray
Showing the data and reducing the clutter means reducing extraneous gridlines, markers,
and shades that obscure the actual data. Active titles, better labels, and helpful annotations
will integrate your chart with the text around it. When charts are dense with many data
series, you can use color strategically to highlight series of interest or break one dense chart
into multiple smaller versions.
Taken together, these five guidelines remind me of the needs of my audience and how my
visuals can tell them a story.
GUIDELINE 1: SHOW THE DATA
Your reader can only grasp your point, argument, or story if they see the data. This doesn’t
mean that all the data must be shown, but it does mean that you should highlight the values
30 փ PRINCIPLES OF DATA VISUA LIZATION
that are important to your argument. As chart creators, our challenge is deciding how much
data to show and the best way to show it.
Consider this dot density map of the United States (see page 244 for more on this kind
of map). It uses data from the 2010 U.S. decennial census and places a dot for each of the
country’s 308 million residents in their census blocks (a census block roughly corresponds
to a city block). Notice how there is nothing in the image except for the data. There are no
state borders, roads, city markers, or labels for lakes and rivers. We still recognize it as the
United States because people tend to live along borders and coasts, which helps give shape
to the country.
This doesn’t mean we must show all of the data all the time. Sometimes charts show too
much data, making it hard to see which data points matter most. On the next page are
two line charts that both show the average number of years of schooling for fifty countries
around the world. In the graph on the left, each country is assigned its own color. This makes
The Gestalt principle of similarity helps us see the clusters of people around the country.
Source: Image Copyright, 2013, Weldon Cooper Center for Public Service, Rector and Visitors of the
University of Virginia (Dustin A. Cable, creator).
FIVE GUIDELINES FOR BETTER DATA VISUA LIZATIONSփ 31
Average years of schooling has increased around the world Average years of schooling has increased around the world
(Number of years) (Number of years)
16 16
14 14 Germany
United States
12 12
10 10 Spain
Mexico
8 8 China
6 6
Nepal
4 4
2 2
0 0
1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017
Source: Our World in Data Source: Our World in Data
Highlighting just a few countries in the chart on the right makes it easier to read.
it busy and confusing, impossible to pick out a trend for any single country. In the graph on
the right, just six countries of interest are highlighted while the remaining are set in gray,
blending them into a neutral background. This gives the reader a clear view of the countries
we want to highlight. It’s not about showing the least amount of data, it’s about showing the
data that matter most.
GUIDELINE 2: REDUCE THE CLUTTER
The use of unnecessary visual elements distracts your reader from the central data and clut-
ters the page. There are lots of different types of chart clutter we might want to avoid. There
are basic elements like heavy tick marks and gridlines, which we should remove in almost
every case. Some graphs use data markers like squares, circles, and triangles to distinguish
between series, but when the markers overlap they jumble the patterns. Some use textured
or filled gradients when simple, solid shades of color work just as well. Some use unnec-
essary dimensions that distort the data. And others contain too much text and too many
labels, cluttering the space and crowding out the data.
Take this three-dimensional column chart of average schooling for the United States and
Germany for a few select years.
Average years of schooling has grown faster in Germany than in the United States
(Number of years)
Germany United States
1500.00% 12,700 13,700

12,900 14,100
13,400
10,000
1000.00%
500.00%
0.00%
1997
2007
2017
You’ve seen these kinds of 3D charts before—they are distracting, hard to read,
and distort the data.
If you think that this looks so outlandish that no one would ever style a chart this way,
you’d be wrong. I’ve copied the exact style from another chart, even down to the gradient styl-
ing. The three-dimensional bars and shimmering stripes, mismatched data and axis labels,
the abundance of decimals that suggest a level of data precision that’s not actually there—
all these combine to create a graph that is difficult to read and, quite honestly, uncomfortable
to look at. Also notice how the three-dimensional view distorts the data. The first bar never
touches the gridline even though it should match it exactly. This distortion occurs because
the unnecessary third dimension requires adding perspective to the graph. Simplifying the
graph by discarding these extraneous, distracting elements and showing the data makes
your argument clear and comprehensible.
While much of our understanding of perception and how our eyes and brains work is
rooted in scientific research, our decisions of which graph to use, where we place labels and
annotation, which colors and fonts to use, and how we lay out our visualizations is mostly
subjective. There are cases where certain graphs are wrong, but many other cases call for

Preview-9780231550154 A42427036

Uploaded by

Copyright:

Available Formats

Preview-9780231550154 A42427036

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Preview-9780231550154 A42427036

Uploaded by

Copyright:

Available Formats

BETTER DATA

Library of Congress Cataloging-in-Publication Data

Columbia University Press books are printed on permanent and

Printed in the United States of America

PART ONE: PRINCIPLES OF DATA VISUALIZATION

2. FIVE GUIDELINES FOR BETTER DATA VISUALIZATIONS 29

3. FORM AND FUNCTION: LET YOUR AUDIENCE’S NEEDS DRIVE

PA RT TWO: C HART T Y PES

Pyramid Chart 185

10. QUALITATIVE 311

11. TABLES 327

PART THREE: DESIGNING AND REDESIGNING

13. REDESIGNS 369

Line Chart: Net Government Borrowing 385

APPENDIX 1: DATA VISUALIZATION TOOLS 397

APPENDIX 2: FURTHER READING AND RESOURCES 403

HOW I LEARNED TO VISUALIZE MY DATA

Once I settled on declaring my economics major at the University of Wisconsin at Madison

bu;1ঞomņvѴor; Angle Parts of a whole

This graphic from an interactive visualization from the Organisation of Economic

Obs. No. 1 : 10 8.0 10 9.1 10 7.5 8 6.6

Mean 9.0 7.5 9.0 7.5 9.0 7.5 9.0 7.5

Source: Francis Anscombe

GESTALT PRINCIPLES OF VISUAL PERCEPTION

Table 1. Our sales grew to $600 million this year

Shape Enclosure Line Width Saturation Color

Size Markings Orientation Position 3D

Length Curvature Density Closure Sharpness

US Real GDP growth is projected to decline and stabilize around 1.7%

1. Show the data

GUIDELINE 1: SHOW THE DATA

GUIDELINE 2: REDUCE THE CLUTTER

1500.00% 12,700 13,700

You might also like