Dataviz Cheatsheet
Dataviz Cheatsheet
Dataviz Cheatsheet
Data visualization is the graphic representation of data. It involves producing images that communicate
relationships among the represented data to viewers. Visualizing data is an esstential part of data analysis and
machine learning, but choosing the right type of visualization is often challenging. This guide provides an
introduction to popluar data visualization techniques, by presenting sample use cases and providing code
examples using Python.
Line graph
Scatter plot
Histogram and Frequency Distribution
Heatmap
Contour Plot
Box Plot
Bar Chart
Import libraries
Matplotlib: Plotting and visualization library for Python. We'll use the pyplot module from matplotlib. As
convention, it is often imported as plt.
Seaborn: An easy-to-use visualizetion library that builds on top of Matplotlib and lets you create beautiful
charts with just a few lines of code.
# Import libraries
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
# Configuring styles
sns.set_style("darkgrid")
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (9, 5)
matplotlib.rcParams['figure.facecolor'] = '#00000000'
Line Chart
A line chart displays information as a series of data points or markers, connected by a straight lines. You can
customize the shape, size, color and other aesthetic elements of the markers and lines for better visual clarity.
Example
We'll create a line chart to compare the yields of apples and oranges over 12 years in the imaginary region of
Hoenn.
# Sample data
years = range(2000, 2012)
apples = [0.895, 0.91, 0.919, 0.926, 0.929, 0.931, 0.934, 0.936, 0.937, 0.9375, 0.9372,
oranges = [0.962, 0.941, 0.930, 0.923, 0.918, 0.908, 0.907, 0.904, 0.901, 0.898, 0.9, 0
# First line
plt.plot(years, apples, 'b-x', linewidth=4, markersize=12, markeredgewidth=4, markeredg
# Second line
plt.plot(years, oranges, 'r--o', linewidth=4, markersize=12,);
# Title
plt.title('Crop Yields in Hoenn Region')
# Line labels
plt.legend(['Apples', 'Oranges'])
# Axis labels
plt.xlabel('Year'); plt.ylabel('Yield (tons)');
Scatter Plot
In a scatter plot, the values of 2 variables are plotted as points on a 2-dimensional grid. Additonally, you can also
use a third variable to determine the size or color of the points.
Example
The Iris ower dataset provides samples measurements of sepals and petals for 3 species of owers. The Iris
dataset is included with the seaborn library, and can be loaded as a pandas dataframe.
We can use a scatter plot to visualize sepal length & sepal witdh vary across different species of owers. The
points for each species form a separate cluster, with some overlap between the Versicolor and Virginica species.
Example
We can use a histogram to visualize how the values of sepal width are distributed.
sns.distplot(data.sepal_width, kde=False);
We can immediately see that values of sepal width fall in the range 2.0 - 4.5, and around 35 values are in the range
2.9 - 3.1. We can also look at this data as a frequency distribution, where the values on Y-axis are percentagess
instead of counts.
sns.distplot(data.sepal_width);
Heatmap
A heatamp is used to visualize 2-dimensional data like a matrix or a table using colors.
Example
We'll use another sample dataset from Seaborn, called " ights", to visualize monthly passenger footfall at an
airport over 12 years.
# Chart Title
plt.title("No. of Passengers (1000s)")
Example
We can visulize the values of sepal width & sepal length from the owers dataset using a contour plot. The shade
of blue represent the density of values in a region of the graph.
plt.title("Flowers")
Box Plot
A box plot shows the distribution of data along a single axis, using a "box" and "whiskers". The lower end of the box
represents the 1st quartile (i.e. 25% of values are below it), and the upper end of the box represents the 3rd quartile
(i.e. 25% of values are above it). The median value is represented via a line inside the box. The "whiskers" represent
the minimum & maximum values (sometimes excluding outliers, which are represented as dots).
Example
We'll use another sample dataset included with Seaborn, called "tips". The dataset contains information about the
sex, time of day, total bill and tip amount for customers visiting a restraurant over a week.
We can use a box plot to visualize the distribution of total bill for each day of the week, segmented by whether the
customer was a smoker.
# Chart title
plt.title("Daily Total Bill")
Bar Chart
A bar chart presents categorical data with rectangular bars with heights proportional to the values that they
represent. If there are multiple values for each category, then a bar plot can also represent the average value, with
con dence intervals.
Example
We can use a bar chart visulize the average value of total bill for different days of the week, segmented by sex, for
the "tips" dataset
sns.barplot(x="day", y="total_bill", hue="sex", data=tips);
Further Reading
This guide intends to serve as introduction to the most commonly used data visualization techniques. With minor
modi cations to the examples shown above, you can visualize a wide variety of datasets. Visit the o cial
documentation websites for more examples & tutorials:
Seaborn: https://seaborn.pydata.org/tutorial.html
Matplotlib: https://matplotlib.org/tutorials/index.html
To share your data visualzations online, just install the Jovian python library and run jovian.commit .
import jovian
jovian.commit(project='dataviz-cheatsheet')