The Data Literacy Cheat Sheet: Charts: Which One Should You Use?

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

The Data Literacy Cheat Sheet

Charts: Which one should you use?

Comparing multiple values Displaying the composition of a whole Showing the relationship between
sets

Circular
Column Bar area Area Waterfall Pie
Line Scatter

Bullet Stacked
Line Scatter Stacked
bar column

Showing distribution of values Analyzing trends

Column Bar Line Scatter Column Line Dual-axis-line Bubble

REMEMBER! Averages:
The five characteristics Which one should you use?
of good data:
An average, also referred to as a “Measure of central tendency”, is a value that attempts to
identify the central position within a set of data. Mean, Median and Mode are types of
Credible
average.
Actionable
MEAN Does your data have a continuous distribution that’s relatively symmetrical? Use MEAN
Unbiased
(often referred to as just ‘average’).
Statistically relevant
MEDIAN Does your data contain significant outliers? Use MEDIAN - it's less influenced by this.
Easy to interpret
MODE represents the most common value in a dataset. If you’re dealing with Nominal data
(non-numeric categories like “industry vertical”), MODE is the only appropriate average to use.

CHARTING TIPS
Weighted average
A weighted average is a type of MEAN, where some values in the data set are given more influence than others. Each value to
be averaged is given a weight, representing the importance of that value in the average.
Weighted averages are important when you are dealing with frequencies or distributions, or when working with data that’s
unequal in some way.

Common cognitive biases


CONFIRMATION BIAS SELECTION BIAS
The tendency to search for results that confirm your preconceptions. The tendency to select data for analysis that is not
properly random.
OBSERVATION BIAS
The tendency to see what we expect (or want) to see in results. SAMPLING BIAS
(A type of Selection bias) The tendency to collect a
FUNDING BIAS
sample of data in such a way that some members of
The tendency to support the interests of a study's financial sponsor
the population are less likely to be included than
or business.
others.
How to question data

SOURCE SCALES FILTERS TIMEFRAME GAPS EXCESS


Do you know Are the scales of Have any specific What is the date Are there obvious Is there anything
where the data each axis clear filters been applied range for the omissions to the presented that's
came from? and effective? to the data set? presented data? data set? not relevant?

UNIT (S) LABELS ACTIONABLE TREND PATTERNS DIMENSIONS


Is it clear what Is the data clearly Can the insights Is it trending Are there cyclic Is the data segmented
the data in the titled and labelled, presented be used upwards, patterns (e.g. into clear, meaningful
chart represent? in a descriptive in an actionable downwards or seasonality) in the dimensions, e.g.
way? way? flat? data? “Pricing plan”?

CHARTING TIPS

Should I truncate the Y axis?


Truncating the Y axis (i.e. not starting at zero) is controversial • If you need to emphasise small changes and trends in data,
in the world of data visualization. It can be misinterpreted and consider showing a chart of the change rather than absolute
has been used to mislead consumers in a number of cases. In numbers.
general, it’s not a good idea unless you can clearly show that
NEVER truncate the Y axis when:
the axis doesn’t start at zero.
1. Using a bar or column chart (Bar charts should start at zero).
• It can be acceptable to truncate when you’re displaying data 2. The information is likely to be mis-interpreted.
that would never feasibly reach zero anyway, e.g. global 3. It doesn't help in some way with understanding the chart.
average temperature.

Data correlation
Correlation is a statistical relationship between two data sets. Correlation can have a numeric value on a scale from -1 to 1.
A POSITIVE correlation is present when both values increase together, whereas in a NEGATIVE correlation, one value decreases
as the other increases.

e
ativ osit
ive
neg ct p on
fect n tion e CHARTING TIPS
Per relatio rela rf
Pe relati
Name cor
cor No cor Correlation is not causation!
-1 -0.5 0 0.5 1 A strong correlation between two data sets
Value Negative correlation Positive correlation does not necessarily mean that one thing
causes the other (causation). There could
be other reasons the data has a strong
Typical correlation.
Chart

Glossary
QUALITATIVE DATA is descriptive - it describes something, e.g. Reason for customer cancellation.
QUANTITATIVE DATA is always numerical (involves numbers), e.g. Revenue lost from customer cancellations.
DISCRETE DATA can only take certain values (like whole numbers), e.g. Number of customers churned.
CONTINUOUS DATA can take any value, within a given range, e.g. Customer churn rate.
CATEGORICAL DATA can be sorted, according to defined groups or categories, e.g. Industry vertical.
STATISTICAL SIGNIFICANCE is when the observed outcome of an experiment is unlikely to have occurred due to chance.
This is an important factor when running multi-variant (A/B) tests on your product or website.
Determining statistical significance can be complex. We recommend using a free tool such as AB Testguide: https://abtestguide.com/calc

More resources
The Data Viz Checklist by Stephanie Evergreen: http://stephanieevergreen.com/wp-content/uploads/2016/10/DataVizChecklist_May2016.pdf
Graph Choice Chart by Tuva Labs: https://tuvalabs.com/static/documents/Graph_Choice_Chart.pdf

Try the World's Best Subscription Analytics App for FREE at chartmogul.com

You might also like