Data Visualization With Python For Beginners - Visualize
Data Visualization With Python For Beginners - Visualize
Data Visualization With Python For Beginners - Visualize
WITH PYTHON
FOR BEGINNERS
Visualize Your Data Using Pandas,
Matplotlib and Seaborn
AI PUBLISHING
© Copyright 2020 by AI Publishing
All rights reserved.
First Printing, 2020
Edited by AI Publishing
Ebook Converted and Cover by Gazler Studio
Published by AI Publishing LLC
ISBN-13: 978-1-7330426-8-0
Legal Notice:
You cannot amend, distribute, sell, use, quote, or paraphrase any part
of the content within this book without the consent of the author.
Disclaimer Notice:
Please note the information contained within this document is for
educational and entertainment purposes only. No warranties of
any kind are expressed or implied. Readers acknowledge that the
author is not engaging in the rendering of legal, financial, medical,
or professional advice. Please consult a licensed professional before
attempting any techniques outlined in this book.
https://www.aispublishing.net/book-data-visualization
About the Publisher
Preface��������������������������������������������������������������������������������������1
Chapter 1: Introduction���������������������������������������������������������� 7
1.1. What is Data Visualization.......................................................7
1.2. Environment Setup.....................................................................8
1.3. Python Crash Course................................................................ 21
1.4. Data Visualization Libraries.................................................. 43
Exercise 1.1................................................................................... 45
Exercise 1.2.................................................................................. 45
Exercise Solutions��������������������������������������������������������������247
Exercise 1.1................................................................................. 247
Exercise 1.2................................................................................248
Exercise 2.1................................................................................249
Exercise 2.2...............................................................................250
Exercise 3.1................................................................................. 251
Exercise 3.2............................................................................... 252
Exercise 4.1................................................................................ 253
Exercise 4.2...............................................................................254
Exercise 5.1................................................................................255
Exercise 5.2...............................................................................256
Exercise 6.1................................................................................258
Exercise 6.2...............................................................................259
Exercise 7.1.................................................................................259
Exercise 7.2............................................................................... 260
Exercise 8.1................................................................................262
Exercise 9.1................................................................................263
Exercise 9.2...............................................................................264
Exercise 10.1...............................................................................266
Exercise 10.2............................................................................. 267
Preface
§§ Book Approach
The book follows a very simple approach. It is divided into
10 chapters. Chapter 1 contains an introduction while the 2nd
and 3rd chapters cover the Matplotlib library. Python’s Seaborn
library is covered in 4th and 5th chapters while the 6th and 7th
chapters explore the Pandas library. The 8th chapter covers 3-D
plotting, while the 9th chapter explains how to draw maps via
the Basemap library. Finally, the 10th chapter covers interactive
data visualization via the Plotly library.
https://www.aispublishing.net/book-data-visualization
Get in Touch with Us
In the first chapter of this book, you will see how to set up the
Python environment needed to run various data visualization
libraries. The chapter also contains a crash Python course
for absolute beginners in Python. Finally, the different data
visualization libraries that we are going to study in this book
have been discussed. The chapter ends with a simple exercise.
8 | Introduction
3. Run the executable file after the download is complete.
You will most likely find the download file in your
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 9
4. Now click I Agree on the License Agreement dialog, as
shown in the following screenshot.
10 | Introduction
6. Now, the Choose Install Location dialog will be displayed.
Change the directory if you want, but the default is
preferred. The installation folder should at least have 3
GB of free space for Anaconda. Click the Next button.
7. Go for the second option, as my Register Anaconda
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 11
8. Click Next once the installation is complete.
12 | Introduction
10. You have successfully installed Anaconda on your
Windows. Excellent job. The next step is to uncheck
both checkboxes on the dialog box. Now, click on the
Finish button.
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 13
3. Run the executable file after the download is complete.
You will most likely find the downloaded file in your
download folder. The name of the file should be much
14 | Introduction
4. Now click Continue on the Welcome to Anaconda 3
Installer window, as shown in the following screenshot.
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 15
6. Click Continue on the Software License Agreement
Dialog.
16 | Introduction
8. On the next window that appears, just click Install.
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 17
The system will prompt you to give your password. Use the
same password you use to login to your Mac computer. Now,
click on Install Software.
9. Click Continue on the next window. You also have the
option to install Microsoft VSCode at this point.
18 | Introduction
The next screen will display the message that the installation
has completed successfully. Click on the Close button to close
the installer.
https://www.anaconda.com/distribution/
2. The second step is to download the installer bash script.
Log into your Linux computer and open your terminal.
Now, go to /temp directory and download the bash you
downloaded from Anaconda’s home page using curl.
$ cd / tmp
$ curl –o https://repo.anaconda.com.archive/
Anaconda3-5.2.0-Linux-x86_64.sh
$ sha256sum Anaconda3-5.2.0-Linux-x86_64.sh
09f53738b0cd3bb96f5b1bac488e5528df9906be2480fe61df-
40e0e0d19e3d48
Anaconda3-5.2.0-Linux-x86_64.sh
Output
Output
[/home/tola/anaconda3] >>>
$ source `/.bashrc
Output:
Script 2:
# A string Variable
first_name = “Joseph”
print(type(first_name))
# An Integer Variable
age = 20
print(type(age))
#List
cars = [“Honda”, “Toyota”, “Suzuki”]
print(type(cars))
......
28 | Introduction
#Tuples
days = (“Sunday”, “Monday”, “Tuesday”, “Wednesday”,
“Thursday”, “Friday”, “Saturday”)
print(type(days))
#Dictionaries
days2 = {1:”Sunday”, 2:”Monday”, 3:”Tuesday”,
4:”Wednesday”, 5:”Thursday”, 6:”Friday”, 7:”Saturday”}
print(type(days2))
Output:
<class ‘str’>
<class ‘int’>
<class ‘float’>
<class ‘bool’>
<class ‘list’>
<class ‘tuple’>
<class ‘dict’>
Arithmetic Operators
Arithmetic operators are used to perform arithmetic operations
in Python. The following table sums up the arithmetic operators
supported by Python. Suppose X = 20 and Y = 10.
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 29
Operator
Symbol Functionality Example
Name
Addition + Adds the operands on X+ Y= 30
either side
Subtraction − Subtracts the operands on X -Y= 10
either side
Multiplication * Multiplies the operands on X * Y= 200
either side
Division / Divides the operand on X / Y= 2.0
the left by the one on right
Modulus % Divides the operand on X % Y= 0
the left by the one on right
and returns remainder
Exponent ** Takes exponent of the X ** Y =
operand on the left to the 1024 x e10
power of right
Script 3:
X = 20
Y = 10
print(X + Y)
print(X - Y)
print(X * Y)
print(X / Y)
print(X ** Y)
Output:
30
10
200
2.0
10240000000000
30 | Introduction
Logical Operators
Logical operators are used to perform logical AND, OR, and
NOT operations in Python. The following table summarizes the
logical operators. Here, X is True, and Y is False.
Script 4:
X = True
Y = False
print(X and Y)
print(X or Y)
print(not(X and Y))
Output:
False
True
True
Comparison Operators
Comparison operators, as the name suggests, are used to
compare two or more than two operands. Depending upon the
relation between the operands, comparison operators return
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 31
Script 5:
X = 20
Y = 35
print(X == Y)
print(X != Y)
print(X > Y)
print(X < Y)
print(X >= Y)
print(X <= Y)
32 | Introduction
Output:
False
True
False
True
False
True
Assignment Operators
Assignment operators are used to assign values to variables.
The following table summarizes the assignment operators.
Here, X is 20, and Y is equal to 10.
Script 6:
X = 20; Y = 10
R = X + Y
print(R)
X = 20;
Y = 10
X += Y
print(X)
X = 20;
Y = 10
X -= Y
print(X)
X = 20;
Y = 10
X *= Y
print(X)
X = 20;
Y = 10
X /= Y
print(X)
X = 20;
Y = 10
X %= Y
print(X)
X = 20;
Y = 10
X **= Y
print(X)
34 | Introduction
Output:
30
30
10
200
2.0
0
10240000000000
Membership Operators
Membership operators are used to find if an item is a member of
a collection of items or not. There are two types of membership
operators. They are the in operator and the not in operator.
The following script shows the in operator in action.
Script 7:
Output:
True
Script 8:
Output:
True
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 35
IF Statement
If you have to check for a single condition and you do not
concern about the alternate condition, you can use the if
statement. For instance, if you want to check if 10 is greater
than 5, and based on that you want to print a statement, you
can use the if statement. The condition evaluated by the if
statement returns a Boolean value. If the condition evaluated
by the if statement is true, the code block that follows the if
statement executes. It is important to mention that in Python,
a new code block starts at a new line with on tab indented
from the left when compared with the outer block.
Script 8:
# The if statment
if 10 > 5:
print(«Ten is greater than 10»)
36 | Introduction
Output:
IF-Else Statement
The If-else statement comes handy when you want to
execute an alternate piece of code in case the condition for
the if statement returns false. For instance, in the following
example, the condition 5 < 10 will return false. Hence, the code
block that follows the else statement will execute.
Script 9:
# if-else statement
if 5 > 10:
print(“5 is greater than 10”)
else:
print(«10 is greater than 5»)
Output:
10 is greater than 5
IF-Elif Statement
The if-elif statement comes handy when you have to
evaluate multiple conditions. For instance, in the following
example, we first check if 5 > 10 which evaluates to false. Next,
an elif statement evaluates the condition 8 < 4, which also
returns false. Hence, the code block that follows the last else
statement executes.
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 37
Script 10:
if 5 > 10:
print(«5 is greater than 10»)
elif 8 < 4:
print(«8 is smaller than 4»)
else:
print(«5 is not greater than 10 and 8 is not smaller
than 4»)
Output:
Script 11:
items = range(5)
for item in items:
print(item)
Output:
0
1
2
3
4
While Loop
The while loop keeps executing a certain piece of code unless
the evaluation condition becomes false. For instance, the while
loop in the following script keeps executing unless variable c
becomes greater than 10.
Script 12:
c = 0
while c < 10:
print(c)
c = c +1
Output:
0
1
2
3
4
5
6
7
8
9
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 39
1.3.9. Functions
Functions, in any programming language, are used to
implement that piece of code that is required to be executed
numerous times at different locations in the code. In such
cases, instead of writing long pieces of codes, again and again,
you can simply define a function that contains the piece of
code, and then you can call the function wherever you want
in the code.
Script 13:
def myfunc():
print(“This is a simple function”)
Output:
You can also pass values to a function. The values are passed
inside the parenthesis of the function call. However, you
must specify the parameter name in the function definition,
too. In the following script, we define a function named
40 | Introduction
Script 14:
def myfuncparam(num):
print(“This is a function with parameter value: “+num)
Output:
Script 15:
def myreturnfunc():
return “This function returns a value”
val = myreturnfunc()
print(val)
Output:
Script 16:
class Fruit:
name = “apple”
price = 10
def eat_fruit(self):
print(“Fruit has been eaten”)
f = Fruit()
f.eat_fruit()
print(f.name)
print(f.price)
42 | Introduction
Output:
Script 17:
class Fruit:
name = “apple”
price = 10
def eat_fruit(self):
print(“Fruit has been eaten”)
f = Fruit(«Orange», 15)
f.eat_fruit()
print(f.name)
print(f.price)
Output:
1.4.1. Matplotlib
Matplotlib is the de facto standard for static data visualization
in Python. Being the oldest data visualization library in Python,
Matplotlib is the most widely used data visualization library.
Matplotlib was developed to resemble MATLAB, which is one
of the most widely used programming languages in academia.
While Matplotlib graphs are easy to plot, the look and feel
of the Matplotlib plots have a distinct feel of the 1990s.
Many wrappers libraries like Pandas and Seaborn have been
developed on top of Matplotlib. These libraries allow users to
plot much cleaner and sophisticated graphs.
1.4.2. Seaborn
Seaborn library is built on top of the Matplotlib library and
contains all the plotting capabilities of Matplotlib. However,
with Seaborn, you can plot much more pleasing and aesthetic
graphs with the help of Seaborn default styles and color
palettes.
44 | Introduction
1.4.3. Basemap
The Basemap library is a Matplotlib Extension and is used
to plot Geographical Maps in Python. The working of the
Basemap library has been explained in detail in chapter 4 of
this book.
1.4.4. Pandas
Pandas library, like Seaborn, is based on the Matplotlib library
and offers utilities that can be used to plot different types
of static plots in a single line of codes. With pandas, you can
import data in various formats such as CSV (Comma Separated
View) and TSV (Tab Separated View), and can plot a variety of
data visualizations via these data sources.
1.4.5. Plotly
Plotly is an online data visualization platform that supports
interactive data visualization. However, you can also create
interactive visualizations within the Python notebook using
Plotly. Chapter 10 explains how to use Plotly for interactive
data visualization in Python.
Exercise 1.1
Question 1
A- For Loop
B- While Loop
C- Both A & B
D- None of the above
Question 2
A- Single Value
B- Double Value
C- More than two values
D- None
Question 3
A- In
B- Out
C- Not In
D- Both A and C
Answer: D
Exercise 1.2
Print the table of integer 9 using a while loop:
2
Basic Plotting
with Matplotlib
2.1. Introduction
In the first chapter of the book, you saw briefly what data
visualization is, why it is important, and what its various
applications are. You also installed different software that we
will be using in order to execute data visualization scripts in
this book.
In this chapter, you will see how to draw some of the most
commonly used plots with the Matplotlib library.
48 | B a s i c P l ot t i n g with M at p l ot l i b
Finally, before you can plot any graphs with Matplotlib library,
you will need to import the pyplot module from the Matplotlib
library. And since all the scripts will be executed inside Jupyter
notebook, the statement %matplotlib inline has been used to
generate plots inside Jupyter notebook. Execute the following
script:
Script 1:
Output:
50 | B a s i c P l ot t i n g with M at p l ot l i b
Script 2:
fig = plt.figure()
ax = plt.axes()
ax.plot(x_vals, y_vals)
Here is the output of the above script. This method can be used
to plot multiple plots, which we will see in the next chapter. In
this chapter, we will stick to the first approach, where we call
the plot() method directly from the pyplot module.
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 51
You can also increase the default plot size of a Matplotlib plot.
To do so, you can use the rcParams list of the pyplot module
and then set two values for the figure.figsize attribute. The
following script sets the plot size to 8 inches wide and 6 inches
tall.
Script 3:
plt.rcParams[“figure.figsize”] = [8,6]
In the output, it is evident that the default plot size has been
increased.
Output:
52 | B a s i c P l ot t i n g with M at p l ot l i b
Script 4:
Here in the output, you can see the labels and titles that you
specified in the script 4.
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 53
Script 5:
Output:
Script 6:
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 55
You can also plot multiple line plots inside one graph. All
you have to do is call the plot() method twice with different
values for x and y axes. The following script plots a line plot
for square root in red and for a cube function in blue.
Script 7:
Output:
56 | B a s i c P l ot t i n g with M at p l ot l i b
Script 8:
import pandas as pd
data = pd.read_csv(«E:\Data Visualization with Python\
Datasets\iris_data.csv»)
If you do not see any error, the file has been read successfully.
To see the first five rows of the Pandas dataframe containing
the data, you can use the head() method as shown below:
Script 9:
data.head()
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 57
Output:
You can see that the iris_data.csv file has five columns. We
can use values from any of these two columns to plot a line
plot. To do so, for x and y axes, we need to pass the data
dataframe column names to the plot() function of the pyplot
module. To access a column name from a Pandas dataframe,
you need to specify the dataframe name followed by a pair
of square brackets. Inside the brackets, the column name is
specified. The following script plots a line plot, where the x-axis
contains values from the sepal_length column, whereas the
y-axis contains values from the petal_length column of the
dataframe.
Script 10:
plt.xlabel(‘Sepal Length’)
plt.ylabel(‘Petal Length’)
plt.title(‘Sepal vs Petal Length’)
plt.plot(data[«sepal_length»], data[«petal_length»],
‘b’)
58 | B a s i c P l ot t i n g with M at p l ot l i b
Script 11:
import pandas as pd
data = pd.read_csv(«E:\Data Visualization with
Python\Datasets\iris_data.tsv», sep=’\t’)
data.head()
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 59
The remaining process to plot the line plot remains the same
as it was for the CSV file. The following script plots a line plot
where the x-axis contains sepal length, and the y-axis displays
petal length.
Script 12:
plt.xlabel(‘Sepal Length’)
plt.ylabel(‘Petal Length’)
plt.title(‘Sepal vs Petal Length’)
plt.plot(data[«SepalLength»], data[«PetalLength»], «b»)
Output:
Script 13:
plt.xlabel(‘Sepal Length’)
plt.ylabel(‘Petal Length’)
plt.title(‘Sepal vs Petal Length’)
plt.scatter(data[«SepalLength»], data[«PetalLength»], c =
«b»)
The output shows a scatter plot with blue points. The plot
clearly shows that with an increase in sepal length, the petal
length of an iris flower also increases.
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 61
Script 14:
plt.xlabel(‘Sepal Length’)
plt.ylabel(‘Petal Length’)
plt.title(‘Sepal vs Petal Length’)
plt.scatter(data[«SepalLength»], data[«PetalLength»], c =
«b», marker = «x»)
Output:
Like line plots, you can plot multiple scatter plots inside one
graph. To do so, you have to call the scatter() method twice
with the same value for the x-axis while different values for the
62 | B a s i c P l ot t i n g with M at p l ot l i b
y-axis. In the following script, you will see two scatter plots.
The first scatter plot plots the relation between sepal vs. petal
length using blue markers, and the second scatter plot plots
the relation between sepal length and sepal width using red
markers.
Script 15:
plt.xlabel(‘Sepal Length’)
plt.ylabel(‘Petal Length’)
plt.title(‘Sepal vs Petal Length’)
plt.scatter(data[«SepalLength»], data[«PetalLength»], c =
«b», marker = «x», label=»Petal Length»)
plt.scatter(data[«SepalLength»], data[«SepalWidth»], c =
«r», marker = «o», label=»Sepal Width»)
plt.legend(loc=’upper center’)
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 63
Script 16:
import pandas as pd
data = pd.read_csv(r»E:\Data Visualization with Python\
Datasets\titanic_data.csv»)
data.head()
Output:
To plot a bar plot, you need to call the bar() method. The
categorical values are passed on the x-axis, and corresponding
aggregated numerical values are passed on the y-axis. The
following script plots a bar plot between genders and ages of
the Titanic ship.
64 | B a s i c P l ot t i n g with M at p l ot l i b
Script 17:
plt.xlabel(‘Gender’)
plt.ylabel(‘Ages’)
plt.title(‘Gender vs Age’)
plt.bar(data[«Sex»], data[«Age»])
Output:
You can also create horizontal bar plots. To do so, you need to
call the barh() method, as shown below:
Script 18:
plt.xlabel(‘Ages’)
plt.ylabel(‘Class’)
plt.title(‘Class vs Age’)
plt.barh(data[«Pclass»], data[«Age»])
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 65
Output:
2.8. Histograms
Histograms are used to display the distribution of data for a
numeric list of items. To plot a histogram, the hist() method
is used. You simply have to pass a collection of numeric values
to the hist() method. For instance, the following histogram
plots the distribution of values in the Age column of the Titanic
dataset.
66 | B a s i c P l ot t i n g with M at p l ot l i b
Script 19
plt.title(‘Age Histogram’)
plt.hist(data[«Age»])
Output:
Script 20
plt.title(‘Fare Histogram’)
plt.hist(data[«Fare»])
Output:
Script 21:
plt.title(‘Age Histogram’)
plt.hist(data[«Age»], bins = 5)
Output:
Script 22:
Output:
70 | B a s i c P l ot t i n g with M at p l ot l i b
In the previous section, you saw how to plot a pie plot using
raw values. Let’s see how to plot a pie plot using a Pandas
dataframe as the source.
Script 23:
import pandas as pd
data = pd.read_csv(r”E:\Data Visualization with Python\
Datasets\titanic_data.csv”)
pclass = data[“Pclass”].value_counts()
print(pclass)
Here is an output.
Output:
Script 24:
print(pclass.index.values.tolist())
print(pclass.values.tolist())
Output:
Script 25:
labels = pclass.index.values.tolist()
values = pclass.values.tolist()
explode = (0.05, 0.05, 0.05)
Output:
72 | B a s i c P l ot t i n g with M at p l ot l i b
Script 26:
London = [25,26,32,19,28,39,24]
Tokyo = [20,29,23,35,32,26,18]
Paris= [18,21,28,35,29,25,22]
plt.legend()
plt.show()
74 | B a s i c P l ot t i n g with M at p l ot l i b
Output:
Exercise 2.1
Question 1:
A- color
B- c
C- r
D- None of the above
Question 2:
A- title
B- label
C- axis
D- All of the above
Question 3:
A - autopct = ‘%1.1f%%’
B - percentage = ‘%1.1f%%’
C - perc = ‘%1.1f%%’
D - None of the Above
Exercise 2.2
Create a Pie chart that shows the distribution of passengers
with respect to their gender, in the unfortunate Titanic ship. You
can use the Titanic dataset from resources for that purpose.
76 | B a s i c P l ot t i n g with M at p l ot l i b
References
1. https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.plot.
html
2. http://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.
scatter.html
3. https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.bar.
html
4. https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.hist.
html
5. https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.pie.
html
6. https://matplotlib.org/3.1.1/gallery/lines_bars_and_markers/
stackplot_demo.html
3
Advanced Plotting
with Matplotlib
3.1. Introduction
In the second chapter, we started our discussion about the
Matplotlib library and its basic plotting functions. In this
chapter, you will strengthen the knowledge that you gained
in the previous chapter. You will learn how to plot multiple
plots using Matplotlib, how to plot subplots, and how to save
Matplotlib plots to your local drive.
Script 1:
plt.rcParams[“figure.figsize”] = [12,8]
plt.subplot(2,2,1)
plt.plot(x_vals, y_vals, ‘bo-’)
plt.subplot(2,2,2)
plt.plot(x_vals, y_vals, ‘rx-’)
plt.subplot(2,2,3)
plt.plot(x_vals, y_vals, ‘g*-’)
plt.subplot(2,2,4)
plt.plot(x_vals, y_vals, ‘g*-’)
Output:
80 | A dva n c e d P l ot t i n g with M at p l ot l i b
Script 2:
plt.rcParams[“figure.figsize”] = [12,8]
plt.subplot(2,3,1)
plt.plot(x_vals, y_vals, ‘bo-’)
plt.subplot(2,3,2)
plt.plot(x_vals, y_vals, ‘rx-’)
plt.subplot(2,3,3)
plt.plot(x_vals, y_vals, ‘g*-’)
plt.subplot(2,3,4)
plt.plot(x_vals, y_vals, ‘g*-’)
plt.subplot(2,3,5)
plt.plot(x_vals, y_vals, ‘bo-’)
plt.subplot(2,3,6)
plt.plot(x_vals, y_vals, ‘rx-’)
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 81
Output:
Script 3:
plt.rcParams[“figure.figsize”] = [12,8]
plt.subplot(2,3,1)
plt.plot(x_vals, y_vals, ‘bo-’)
plt.subplot(2,3,2)
plt.plot(x_vals, y_vals, ‘rx-’)
plt.subplot(2,3,3)
plt.plot(x_vals, y_vals, ‘g*-’)
plt.subplot(2,3,4)
plt.plot(x_vals, y2_vals, ‘g*-’)
plt.subplot(2,3,5)
plt.plot(x_vals, y2_vals, ‘bo-’)
plt.subplot(2,3,6)
plt.plot(x_vals, y2_vals, ‘rx-’)
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 83
Script 4:
plt.rcParams[“figure.figsize”] = [12,8]
figure = plt.figure()
Output:
Script 5:
plt.rcParams[“figure.figsize”] = [12,8]
figure = plt.figure()
axes.plot(x_vals, y_vals)
axes.set_xlabel(‘X Axis’)
axes.set_ylabel(‘Y Axis’)
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 85
Output:
Script 6:
plt.rcParams[“figure.figsize”] = [12,8]
figure = plt.figure()
In the output below, you can see a line plot inside another plot.
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 87
Script 7:
plt.rcParams[“figure.figsize”] = [12,8]
Output:
Script 8:
plt.rcParams[«figure.figsize»] = [12,8]
Output:
Script 9:
plt.rcParams[«figure.figsize»] = [12,8]
figure.savefig(r’E:/Subplots.jpg’)
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 91
Output:
Exercise 3.1
Question 1:
Which plot function will you use to plot a graph in the 5th cell
of a multiple plot figure with four rows and two columns?
A- plt.subplot(5,4,2)
B- plt.subplot(2,4,5)
C- plt.subplot(4,2,5)
D- None of the Above
Question 2:
How will you create a subplot with five rows and three
columns using subplots() function?
A- plt.subplots(nrows=5, ncols=3)
B- plt.subplots(5,3)
C- plt.subplots(rows=5, cols=3)
D- All of the Above
Question 3
A- figure.saveimage()
B- figure.savegraph()
C- figure.saveplot()
D- figure.savefig()
Exercise 3.2
Draw multiple plots with three rows and one column. Show
the sine of any 30 integers in the first plot, the cosine of the
same 30 integers in the second plot, and the tangent of the
same 30 integers in the 3rd plot.
4
Introduction to the
Python Seaborn Library
4.1. Introduction
In the previous two chapters, you saw how to plot different
types of graphs using Python’s Matplotlib library. In this
chapter, you will see how to perform data visualization with
Seaborn, which is yet another extremely handy Python library
for data visualization. The Seaborn library is based on the
Matplotlib library. Therefore, you will also need to import the
Matplotlib library before you plot any Matplotlib graph.
plt.rcParams[«figure.figsize»] = [10,8]
tips_data = sns.load_dataset(‘tips’)
tips_data.head()
Output:
The tips data set contains records of the bill paid by a customer
at a restaurant. The dataset contains six columns: total_bill, tip,
sex, smoker, day, time, and size. You do not have to download
this dataset as it comes built-in with the Seaborn library. We
will be using the tips dataset to plot some of the Seaborn
plots. So, without any ado, let’s start plotting with Seaborn.
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 95
Script 1:
plt.rcParams[«figure.figsize»] = [10,8]
sns.distplot(tips_data[‘total_bill’])
96 | Introduction to t h e Python Seaborn Library
Output:
Similarly, the following script plots a dist plot for the tip
column of the tips dataset.
Script 2:
sns.distplot(tips_data[‘tip’])
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 97
Script 3:
Output:
Script 4:
Output:
Script 5:
Output:
Script 6:
sns.pairplot(data=tips_data)
100 | Introduction to t h e Python Seaborn Library
Output:
You can also plot multiple pair plots per value in a categorical
column. To do so, you need to pass the name of the categorical
column as the value for the hue parameter. The following
script plots two pair plots (one for lunch and one for dinner)
for every combination of numeric or Boolean columns.
Script 7:
Output:
Script 8:
sns.rugplot(tips_data[‘total_bill’])
Output:
Script 9:
plt.rcParams[«figure.figsize»] = [8,6]
sns.set_style(«darkgrid»)
titanic_data = sns.load_dataset(‘titanic’)
titanic_data.head()
Output:
Script 10:
Output:
You can further categorize the bar plot using the hue attribute.
For example, the following bar plot plots the average ages
of passengers traveling in different classes and further
categorized based on their genders.
Script 11:
Output:
You can also plot multiple bar plots depending upon the
number of unique values in a categorical column. To do so, you
need to call the catplot() function and pass the categorical
column name as the value for the col attribute column. The
following script plots two bar plots—one for the passengers
who survived the Titanic accident and one for those who
didn’t survive.
Script 12:
Output:
Script 13:
sns.countplot(x=’pclass’, data=titanic_data)
Output:
Like a bar plot, you can also further categorize the count plot
by passing a value for the hue parameter. The following script
plots a count plot for the passengers traveling in different
classes of the Titanic ship categorized further by their genders.
Script 14:
Output:
Script 15:
sns.boxplot(x=titanic_data[«fare»])
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 109
Output:
Similarly, the following script plots the vertical box plot for the
fare column of the Titanic dataset.
Script 16:
sns.boxplot(y=titanic_data[«fare»])
Output:
110 | Introduction to t h e Python Seaborn Library
You can also plot multiple box plots for every unique value in
a categorical column. For instance, the following script plots
box plots for the age column of the passengers who traveled
alone as well as for passengers who were accompanied by at
least one other passenger.
Script 17:
Output:
24 and 30. In the same way, you can get information about
the 3rd and 4th age quartile of the passengers traveling alone.
A comparison of the two box plots reveals that the median
age of the passengers traveling alone is slightly greater than
the median age of the passengers accompanied by other
passengers.
Like bar and count plots, the hue attribute can also be used to
categorize box plots.
For instance, the following script plots box plots for the
passengers traveling alone and along with other passengers,
further categorized based on their genders.
Script 18:
Output:
112 | Introduction to t h e Python Seaborn Library
Script 19:
Output:
You can see that the output doesn’t contain any outliers for
the box plots.
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 113
Script 20:
Output:
114 | Introduction to t h e Python Seaborn Library
Script 21:
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 115
For a better comparison and to save space, you can also plot
split violin plots. In split violin plots, each half corresponds to
one value in a category column. For instance, the following
script plots two violin plots—one each for the passengers
traveling alone and for the passengers not traveling alone.
Each plot is further split into two parts based on the genders
of the passengers.
Script 22:
Output:
116 | Introduction to t h e Python Seaborn Library
Script 23:
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 117
Script 24:
Output:
Finally, like violin plots, you can also split strip plots, as
demonstrated by the following example.
Script 25:
Output:
Script 26:
Output:
With the hue parameter, you can further categorize the swarm
plot, as shown in the following script.
120 | Introduction to t h e Python Seaborn Library
Script 27:
Output:
Finally, you can split swarm plots by setting the value of the
split attribute to True, as shown below.
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 121
Script 28:
Output:
Exercise 4.1
Question 1
Which plot is used to plot multiple joint plots for all the
combinations of numeric and Boolean columns in a dataset?
A- Joint Plot
B- Pair Plot
C- Dist Plot
D- Scatter Plot
Answer: B
Question 2
A- barplot()
B- jointplot()
C- catplot()
D- mulplot()
Answer: C
Question 3
A- kind
B- type
C- hue
D- col
Answer: A
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 123
Exercise 4.2
Plot a swarm violin plot using Titanic data that displays the
fare paid by male and female passengers.
References
1. https://seaborn.pydata.org/generated/seaborn.distplot.html
2. https://seaborn.pydata.org/generated/seaborn.jointplot.html
3. https://seaborn.pydata.org/generated/seaborn.pairplot.html
4. https://seaborn.pydata.org/generated/seaborn.rugplot.html
5. https://seaborn.pydata.org/generated/seaborn.barplot.html
6. https://seaborn.pydata.org/generated/seaborn.countplot.html
7. https://seaborn.pydata.org/generated/seaborn.boxplot.html
8. https://seaborn.pydata.org/generated/seaborn.violinplot.html
9. https://seaborn.pydata.org/generated/seaborn.stripplot.html
10. https://seaborn.pydata.org/generated/seaborn.swarmplot.html
5
Advanced Plotting
with Seaborn
Let’s first import the tips dataset from the Seaborn library.
Script 1:
plt.rcParams[«figure.figsize»] = [10,8]
tips_data = sns.load_dataset(‘tips’)
tips_data.head()
Output:
Let’s now plot a scatter plot with the values from the total_
bill column of the tips dataset on the x-axis and values from
the tips column on the y-axis. To plot a scatter plot, you need
to call the scatterplot() method of the Seaborn library.
Script 2:
Output:
To change the color of the scatter plot, simply pass the first
letter of any color to the color attribute of the scatterplot()
function.
Script 3:
Output:
128 | A dva n c e d P l ot t i n g with Seaborn
Finally, to change the marker shape for the scatter plot, you
need to pass a value for the marker attribute. For example, the
following scatter plot plots blue x markers on the scatter plot.
Script 4:
Output:
Script 5:
sns.set_style(‘darkgrid’)
sns.scatterplot(x=»total_bill», y=»tip», data=tips_data,
color = ‘b’, marker = ‘x’)
Output:
Script 6:
sns.set_style(‘whitegrid’)
sns.scatterplot(x=»total_bill», y=»tip», data=tips_data,
color = ‘b’, marker = ‘x’)
130 | A dva n c e d P l ot t i n g with Seaborn
Output:
In addition to styling the background, you can style the plot for
different devices via the set_context() function. By default,
the context is set to notebook. However, if you want to plot
your plot on a poster, you can pass poster as a parameter to
the set_context() function. In the output, you will see a plot
with bigger annotations, as shown below.
Script 7:
sns.set_context(‘poster’)
sns.scatterplot(x=»total_bill», y=»tip», data=tips_data,
color = ‘b’, marker = ‘x’)
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 131
Output:
Script 8:
plt.rcParams[«figure.figsize»] = [8,6]
sns.set_style(«darkgrid»)
titanic_data = sns.load_dataset(‘titanic’)
titanic_data.head()
Output:
Script 9:
titanic_data.corr()
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 133
Output:
From the output above, you can see that we now have
meaningful information across rows as well. In the following
script, we first increase the default plot size and then pass
the correlation matrix of the Titanic dataset to the heatmap()
function to create a heat map.
Script 10:
plt.rcParams[«figure.figsize»] = [10,8]
corr_values = titanic_data.corr()
sns.heatmap(corr_values, annot= True)
You can see a heat map in the output, as shown below. The
higher the correlation is, the darker the cell containing the
correlation.
134 | A dva n c e d P l ot t i n g with Seaborn
Output:
You can see that the above plot is cropped from the top and
bottom. The following script plots the uncropped plot. In the
following script, we use the set_ylim() method to increase
the plot size from top and bottom cell by 0.5 percent.
Script 11:
plt.rcParams[«figure.figsize»] = [10,8]
corr_values = titanic_data.corr()
ax = sns.heatmap(corr_values, annot= True)
bottom, top = ax.get_ylim()
ax.set_ylim(bottom + 0.5, top - 0.5)
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 135
Output:
Script 12:
plt.rcParams[«figure.figsize»] = [10,8]
corr_values = titanic_data.corr()
ax = sns.heatmap(corr_values, annot= True, cmap =
‘coolwarm’)
bottom, top = ax.get_ylim()
ax.set_ylim(bottom + 0.5, top - 0.5)
136 | A dva n c e d P l ot t i n g with Seaborn
Output:
Let’s import the flights dataset from the Seaborn library. The
flights dataset contains records of the passengers traveling
each month from 1949 to 1960.
Script 13:
plt.rcParams[«figure.figsize»] = [10,8]
flights_data = sns.load_dataset(‘flights’)
flights_data.head()
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 137
Output:
Script 14:
flights_data_pivot =flights_data.pivot_table(index=’month’,
columns=’year’, values=’passengers’)
ax = sns.heatmap(flights_data_pivot, cmap = ‘coolwarm’,
linecolor=’black’, linewidth=1)
bottom, top = ax.get_ylim()
ax.set_ylim(bottom + 0.5, top - 0.5)
138 | A dva n c e d P l ot t i n g with Seaborn
Output:
Script 15:
flights_data_pivot =flights_data.pivot_table(index=’month’,
columns=’year’, values=’passengers’)
ax = sns.clustermap(flights_data_pivot, cmap = ‘coolwarm’)
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 139
Output:
Like heat map, you can also specify line color and width
separating cells in a cluster map. Here is an example:
140 | A dva n c e d P l ot t i n g with Seaborn
Script 16:
flights_data_pivot =flights_data.pivot_table(index=’month’,
columns=’year’, values=’passengers’)
ax = sns.clustermap(flights_data_pivot, cmap = ‘coolwarm’,
linecolor=’black’, linewidth=1)
Output:
Before we see pair grids in action, let’s revise how the pair
plot works. The following script plots the pair plot for the tips
dataset.
Script 17:
plt.rcParams[«figure.figsize»] = [10,8]
tips_data = sns.load_dataset(‘tips’)
sns.pairplot(tips_data)
Output:
142 | A dva n c e d P l ot t i n g with Seaborn
Let’s now plot a pair grid for the tips dataset. To do so, you
have to pass the Pandas dataframe containing the tips dataset
to the PairGrid() function, as shown below.
Script 18:
sns.PairGrid(tips_data)
Output:
Script 19:
pgrids = sns.PairGrid(tips_data)
pgrids.map(plt.scatter)
Output:
With a pair grid, you can plot different types of plots on the
diagonal, upper portion from the diagonal, and the lower
portion from a diagonal. For instance, the following pair
144 | A dva n c e d P l ot t i n g with Seaborn
Script 20:
pgrids = sns.PairGrid(tips_data)
pgrids.map_diag(sns.distplot)
pgrids.map_upper(sns.kdeplot)
pgrids.map_lower(plt.scatter)
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 145
Script 21:
You can see gender across columns and time across rows as
respectively, specified by the FacetGrid() function’s col and
row attributes.
146 | A dva n c e d P l ot t i n g with Seaborn
Output:
Similarly, you can use the facet grid to plot scatter plots for
the total_bill and tips columns, with respect to sex and
time columns.
Script 22:
Output:
Script 23:
Output:
You can plot regression plots for two columns on the y-axis. To
do so, you need to pass a column name for the hue parameter
of the lmplot() function.
Script 24:
Output:
Script 25:
Output:
Exercise 5.1
Question 1
A- set_style (‘darkgrid’)
B- set_style (‘whitegrid’)
C- set_style (‘poster’)
D- set_context (‘poster’)
Question 2
A- correlation()
B- corr()
C- heatmap()
D- none of the above
Question 3
A- annotate()
B- annot()
C- mark()
D- display()
152 | A dva n c e d P l ot t i n g with Seaborn
Exercise 5.2
Plot two scatter plots on the same graph using the tips_
dataset. In the first scatter plot, display values from the total_
bill column on the x-axis and from the tip column on the y-axis.
The color of the first scatter plot should be green. In the second
scatter plot, display values from the total_bill column on the
x-axis and from the size column on the y-axis. The color of the
second scatter plot should be blue, and markers should be x.
References
1. https://seaborn.pydata.org/generated/seaborn.scatterplot.html
2. https://seaborn.pydata.org/tutorial/aesthetics.html
3. https://seaborn.pydata.org/generated/seaborn.heatmap.html
4. https://seaborn.pydata.org/generated/seaborn.clustermap.html
5. https://seaborn.pydata.org/generated/seaborn.PairGrid.html
6. https://seaborn.pydata.org/generated/seaborn.FacetGrid.html
7. https://seaborn.pydata.org/generated/seaborn.lmplot.html
6
Introduction to Pandas
Library for Data Analysis
6.1. Introduction
In this chapter, you will see how to use Python’s Pandas library
for data analysis. In the next chapter, you will see how to use
the Pandas library for data visualization by plotting different
types of plots.
import pandas as pd
Script 1:
import pandas as pd
titanic_data = pd.read_csv(r»E:\Data Visualization with
Python\Datasets\titanic_data.csv»)
titanic_data.head()
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 155
Output:
The read_csv() method reads data from a CSV or TSV file and
stores it in a Pandas dataframe, which is a special object that
stores data in the form of rows and columns.
Script 2:
titanic_pclass1= (titanic_data.Pclass == 1)
titanic_pclass1
156 | Introduction to P a n da s L i b r a r y for D ata A n a ly s i s
Output:
0 False
1 True
2 False
3 True
4 False
...
886 False
887 True
888 False
889 True
890 False
Name: Pclass, Length: 891, dtype: bool
Script 3:
titanic_pclass1= (titanic_data.Pclass == 1)
titanic_pclass1_data = titanic_data[titanic_pclass1]
titanic_pclass1_data.head()
Output:
Script 4:
titanic_pclass_data = titanic_data[titanic_data.Pclass == 1]
titanic_pclass_data.head()
Output:
Script 5:
ages = [20,21,22]
age_dataset = titanic_data[titanic_data[«Age»].isin(ages)]
age_dataset.head()
Output:
Script 6:
ages = [20,21,22]
ageclass_dataset = titanic_data[titanic_data[«Age»].
isin(ages) & (titanic_data[«Pclass»] == 1) ]
ageclass_dataset.head()
Output:
Script 7:
The output below shows that the dataset now contains only
Name, Sex, and Age columns.
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 159
Output:
Script 8:
Output:
160 | Introduction to P a n da s L i b r a r y for D ata A n a ly s i s
Script 9:
titanic_pclass1_data = titanic_data[titanic_data.Pclass == 1]
print(titanic_pclass1_data.shape)
titanic_pclass2_data = titanic_data[titanic_data.Pclass == 2]
print(titanic_pclass2_data.shape)
Output:
(216, 12)
(184, 12)
Script 10:
final_data = titanic_pclass1_data.append(titanic_pclass2_
data, ignore_index=True)
print(final_data.shape)
Output:
(400, 12)
The output now shows that the total number of rows is 400,
which is the sum of the number of rows in the two dataframes
that we concatenated.
Script 11:
Output:
(400, 12)
Script 12:
df1 = final_data[:200]
print(df1.shape)
df2 = final_data[200:]
print(df2.shape)
Output:
(200, 12)
(200, 12)
(400, 24)
Script 13:
age_sorted_data = titanic_data.sort_values(by=[‘Age’])
age_sorted_data.head()
Output:
Script 14:
age_sorted_data = titanic_data.sort_values(by=[‘Age’],
ascending = False)
age_sorted_data.head()
Output:
Script 15:
age_sorted_data = titanic_data.sort_
values(by=[‘Age’,’Fare’], ascending = False)
age_sorted_data.head()
Output:
Script 16:
updated_class = titanic_data.Pclass.apply(lambda x : x + 2)
updated_class.head()
The output shows that all the values in the Pclass column have
been incremented by 2.
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 165
Output:
0 5
1 3
2 5
3 3
4 5
Script 17:
def mult(x):
return x * 2
updated_class = titanic_data.Pclass.apply(mult)
updated_class.head()
Output:
0 6
1 2
2 6
3 2
4 6
Name: Pclass, dtype: int64
166 | Introduction to P a n da s L i b r a r y for D ata A n a ly s i s
Script 18:
flights_data = sns.load_dataset(‘flights’)
flights_data.head()
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 167
Script 19:
flights_data_pivot =flights_data.pivot_table(index=’month’,
columns=’year’, values=’passengers’)
flights_data_pivot.head()
Output:
Script 20:
import pandas as pd
titanic_data = pd.read_csv(r»E:\Data Visualization with
Python\Datasets\titanic_data.csv»)
titanic_data.head()
pd.crosstab(titanic_data.Pclass, titanic_data.Age,
margins=True)
Output:
168 | Introduction to P a n da s L i b r a r y for D ata A n a ly s i s
Script 21:
import numpy as np
titanic_data.Fare = np.where( titanic_data.Age > 20,
titanic_data.Fare +5 , titanic_data.Fare)
titanic_data.head()
Output:
Exercise 6.1
Question 1
A- 0
B- 1
C- 2
D- None of the above
Question 2
A- sort_dataframe()
B- sort_rows()
C- sort_values()
D- sort_records()
Question 3
A- filter()
B- filter_columns()
C- apply_filter()
D- None of the above()
Exercise 6.2
Use the apply function to subtract 10 from the Fare column of
the Titanic dataset, without using lambda expression.
170 | Introduction to P a n da s L i b r a r y for D ata A n a ly s i s
References
1 https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.DataFrame.filter.html
2 https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.DataFrame.append.html
3 https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.concat.html
4 https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.DataFrame.sort_values.html
5 https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.DataFrame.apply.html
7
Pandas for
Data Visualization
7.1. Introduction
In the previous chapter, you saw how to work with the Pandas
library for data analysis. You saw how to read CSV files into the
Pandas dataframe, and how to analyze data by performing a
variety of functions on the Pandas dataframe. In this chapter,
you will see how the Pandas library can be used to plot different
types of visualizations. As a matter of fact, the Pandas library
is probably the easiest library for data plotting, as you will see
in this chapter.
Script 1:
import pandas as pd
titanic_data = pd.read_csv(r»E:\Data Visualization with
Python\Datasets\titanic_data.csv»)
titanic_data.head()
Output:
dataframe name and then append the plot name via dot
operator. The following script plots a histogram for the Age
column of the Titanic dataset using the hist() function. It is
important to mention that behind the scenes, the Pandas library
makes use of the Matplotlib plotting functions. Therefore, you
need to import the Matplotlib’s pyplot module before you
can plot Pandas visualizations.
Script 2:
Output:
The other way to plot a graph via Pandas is by using the plot()
function. The type of plot you want to plot is passed to the
kind attribute of the plot() function. The following script uses
the plot() function to plot a histogram for the Age column of
the Titanic dataset.
174 | P a n da s for D ata V i s u a l i z at i o n
Script 3:
Output:
Script 4:
Output:
Script 5:
Output:
176 | P a n da s for D ata V i s u a l i z at i o n
Script 6:
flights_data = sns.load_dataset(‘flights’)
flights_data.head()
Output:
Script 7:
Output:
Similarly, you can change the color of the line plot via the color
attribute, as shown below.
Script 8:
Output:
Script 9:
Output:
The output shows that for each year, we have multiple values.
This is because each year has 12 months. However, the overall
trend remains the same and the number of passengers
traveling by air increases as the years pass.
Script 10:
flights_data.plot.scatter(x=’year’, y=’passengers’,
figsize=(8,6))
180 | P a n da s for D ata V i s u a l i z at i o n
Output:
Like a line plot and histogram, you can also change the color
of a scatter plot by passing the color name as the value for the
color attribute. Look at the following script.
Script 11:
flights_data.plot.scatter(x=’year’, y=’passengers’,
color=’red’, figsize=(8,6))
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 181
Output:
Script 12:
print(sex_mean)
print(type(sex_mean.tolist()))
Output:
Sex
female 27.915709
male 30.726645
Name: Age, dtype: float64
<class ‘list’>
Script 13:
Output:
You can also plot horizontal bar plots via the Pandas library.
To do so, you need to call the barh() function, as shown in the
following example.
Script 14:
Output:
184 | P a n da s for D ata V i s u a l i z at i o n
Finally, like all the other Pandas plots, you can change the color
of both vertical and horizontal bar plots by passing the color
name to the color attribute of the corresponding function.
Script 15:
Output:
Script 16:
Output:
Script 17:
tips_data = sns.load_dataset(‘tips’)
The output shows that most of the time, the tip is between
two and four dollars.
Output:
Script 18:
tips_data.plot.hexbin(x=’total_bill’, y=’tip’, gridsize=20,
figsize=(8,6), color = ‘blue’)
Output:
Script 19:
Output:
Script 20:
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 189
In this section, you will see how to plot time series data with
Pandas. You will work with Google Stock Price data from
7thJanuary 2015 to 7th January 2020. The dataset is available
in the resources folder by the name google_data.csv. The
following script reads the data into a Pandas dataframe.
Script 21:
Output:
190 | P a n da s for D ata V i s u a l i z at i o n
Script 22:
google_stock[‘Date’] = google_stock[‘Date’].apply(pd.to_
datetime)
google_stock.set_index(‘Date’, inplace=True)
google_stock.plot.line( y=’Open’, figsize=(12,8))
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 191
Script 23:
google_stock.resample(rule=’A’).mean()
Output:
Similarly, to plot the monthly mean values for all the columns
in the Google stock dataset, you will need to pass M as a value
for the rule attribute, as shown below.
192 | P a n da s for D ata V i s u a l i z at i o n
Script 24:
google_stock.resample(rule=’M’).mean()
Output:
Script 25:
google_stock[‘Open’].resample(‘A’).mean()
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 193
Output:
Date
2015-12-31 602.676217
2016-12-31 743.732459
2017-12-31 921.121193
2018-12-31 1113.554101
2019-12-31 1187.009821
2020-12-31 1346.470011
Freq: A-DEC, Name: Open, dtype: float64
The list of possible values for the rule attribute is given below:
Script 26:
google_stock[‘Open’].resample(‘A’).mean().plot(kind=’bar’,
figsize=(8,6))
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 195
Similarly, here is the line plot for the yearly mean opening
stock prices for Google stock over a period of five years.
Script 27:
google_stock[‘Open’].resample(‘A’).mean().plot(kind=’line’,
figsize=(8,6))
Output:
Script 28:
google_stock.shift(3).head()
Output:
You can see that the first three rows now contain null values,
while what previously was the first record has now been
shifted to the 4th row.
In the same way, you can shift rows backward. To do so, you
have to pass a negative value to the shift function.
Script 29:
google_stock.shift(-3).tail()
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 197
Exercise 7.1
Question 1
A- set_color()
B- define_color()
C- color()
D- None of the above
Question 2
A- horz_bar()
B- barh()
C- bar_horizontal()
D- horizontal_bar()
Question 3
A - shift_back(5)
B - shift(5)
C - shift_behind(-5)
D - shift(-5)
Exercise 7.2
Display a bar plot using the Titanic dataset that displays the
average age of the passengers who survived vs. those who
did not survive.
References
1. https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.DataFrame.hist.html
2. https://pandas.pydata.org/pandas-docs/version/0.23/
generated/pandas.DataFrame.plot.line.html
3. https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.DataFrame.plot.scatter.html
4. https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.DataFrame.plot.bar.html
5. https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.DataFrame.boxplot.html
6. https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.DataFrame.plot.hexbin.html
7. https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.DataFrame.plot.kde.html
8. https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.Series.resample.html
9. https://pandas.pydata.org/pandas-docs/stable/reference/api/
pandas.DataFrame.shift.html
8
3D Plotting with Matplotlib
In the second and third chapters of this book, you saw how
the Matplotlib library can be used to plot two-dimensional
(2D) plots. In fact, in all the previous chapters, you saw how
to plot 2D plots with different Python libraries. In this chapter,
you will briefly see how the Matplotlib library can be used to
plot 3D plots.
Script 1:
figure1 = plt.figure()
axis1 = figure1.add_subplot( projection=’3d’)
x = [1,7,6,3,2,4,9,8,1,9]
y = [4,6,1,8,3,7,9,1,2,4]
z = [6,4,9,2,7,8,1,3,4,9]
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 201
axis1.plot(x,y,z)
axis1.set_xlabel(‘X-axis’)
axis1.set_ylabel(‘Y-axis’)
axis1.set_zlabel(‘Z-axis’)
plt.show()
Output:
Script 2:
plt.rcParams[«figure.figsize»] = [10,8]
tips_data = sns.load_dataset(‘tips’)
tips_data.head()
Output:
Script 3:
bill = tips_data[‘total_bill’].tolist()
tip = tips_data[‘tip’].tolist()
size = tips_data[‘size’].tolist()
Finally, the following script plots a 3D line plot that shows the
relationship between the total_bill, tip, and size columns of
the tips dataset.
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 203
Script 4:
figure2 = plt.figure()
axis2 = figure2.add_subplot( projection=’3d’)
axis2.plot(bill,tip,size)
axis2.set_xlabel(‘bill’)
axis2.set_ylabel(‘tip’)
axis2.set_zlabel(‘size’)
plt.show()
Output:
204 | 3 D P l ot t i n g with M at p l ot l i b
Script 5:
figure2 = plt.figure()
axis2 = figure2.add_subplot( projection=’3d’)
axis2.scatter(bill,tip,size)
axis2.set_xlabel(‘bill’)
axis2.set_ylabel(‘tip’)
axis2.set_zlabel(‘size’)
plt.show()
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 205
Output:
Script 6:
figure2 = plt.figure()
axis3 = figure2.add_subplot( projection=’3d’)
x3 =bill
y3 = tip
z3 = np.zeros(tips_data.shape[0])
dx = np.ones(tips_data.shape[0])
dy = np.ones(tips_data.shape[0])
dz = bill
axis3.set_xlabel(‘bill’)
axis3.set_ylabel(‘tip’)
axis3.set_zlabel(‘size’)
plt.show()
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 207
Exercise 8.1
Plot a scatter plot that shows the distribution of pclass, age,
and fare columns from the Titanic dataset.
References
1. https://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html#line-
plots
2. h t t p s : //m a t p l o t l i b.o rg /m p l _ to o l k i t s /m p l o t 3 d /t u to r i a l .
html#scatter-plots
3. https://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html#bar-
plots
9
Interactive Data Visualization
with Bokeh
In all the chapters till now, you have been plotting static
graphs. In this chapter and the next one, you will see how
to plot interactive graphs. Interactive graphs are the type of
graphs that show different information based on the actions
performed by the users. In this chapter, you will see how to
plot interactive plots with Python’s Bokeh library. In the next
chapter, you will see how to plot interactive plots with Plotly.
9.1. Installation
Use the pip installer to install the Bokeh library. To do so,
execute the following command on your command line.
Script 1:
import pandas as pd
import numpy as np
%matplotlib inline
import seaborn as sns
flights_data = sns.load_dataset(‘flights’)
flights_data.head()
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 211
Script 2:
Script 3:
output_file(‘E:/bokeh.html’)
Script 4:
plot = figure(
title = ‘Years vs Passengers’,
x_axis_label =’Year’,
y_axis_label =’Passengers’,
plot_width=600,
plot_height=400
)
Next, you need data sources that you will use to plot a graph.
We will be plotting the year against the number of passengers.
Script 5:
year = flights_data[‘year’]
passengers = flights_data[‘passengers’]
212 | I n t e r a c t i v e D ata V i s u a l i z at i o n with Bokeh
Finally, to create a line plot, you have to pass the list of values
for the x- and y-axis to the line() function of the figure class
object. The line_width attribute here is used to set the width
of the line.
Script 6:
At this point in time, the plot has been created and saved.
However, to display the plot, you have to call the show()
method, as shown below.
Script 7:
show(plot)
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 213
Script 8:
month_passengers = flights_data.groupby(«month»)
[«passengers»].mean()
print(month_passengers.index.tolist())
print(month_passengers.tolist())
Output:
Script 9:
plot2 = figure(
x_range = month_passengers.index.tolist(),
title = ‘Month vs Passengers’,
x_axis_label =’Month’,
y_axis_label =’Passengers’,
plot_height=400
)
Script 10:
Output:
Script 11:
plot3 = figure(
title = ‘Years vs Passengers’,
x_axis_label =’Year’,
y_axis_label =’Passengers’,
plot_width=600,
plot_height=400
)
Script 12:
year = flights_data[‘year’]
passengers = flights_data[‘passengers’]
Script 13:
plot3.scatter(year,passengers, legend=’Years vs
Passengers’, line_width=2)
show(plot3)
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 217
Let’s plot another scatter plot using the tips dataset. The
scatter plot shows the values from the total_bill column
on the x-axis and the tips on the y-axis. The following script
loads the tips dataset.
Script 14:
tips_data = sns.load_dataset(‘tips’)
tips_data.head()
Output:
Script 14:
plot4 = figure(
title = ‘Total Bill vs Tips’,
x_axis_label =’Totall Bill’,
y_axis_label =’Tips’,
plot_width=600,
plot_height=400
)
218 | I n t e r a c t i v e D ata V i s u a l i z at i o n with Bokeh
Script 15:
total_bill = tips_data[‘total_bill’]
tips = tips_data[‘tip’]
Script 16:
Output:
Script 17:
plot5 = figure(
title = ‘Total Bill vs Tips’,
x_axis_label =’Totall Bill’,
y_axis_label =’Tips’,
plot_width=600,
plot_height=400
)
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 219
Script 18:
Output:
In this chapter, you saw how to plot interactive plots via the
Bokeh library. In the next chapter, you will see how to plot
interactive plots via the Plotly library, which is yet another
useful library for interactive data plotting.
Exercise 9.1
Question 1
A- figure()
B- width()
C- height()
D- None of the above
Question 2
A- line
B- width
C- line_width
D- length
Question 3
In the Bokeh library, the list of values used to plot bar plots is
passed to the following attribute of the bar plot:
A- values
B- legends
C- y
D- top
Exercise 9.2
Plot a bar plot using the Titanic dataset that displays the
average age of both male and female passengers.
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 221
References
1. https://docs.bokeh.org/en/latest/docs/reference/plotting.
html#bokeh.plotting.figure.Figure.line
2. https://docs.bokeh.org/en/latest/docs/reference/plotting.
html#bokeh.plotting.figure.Figure.vbar
3. https://docs.bokeh.org/en/latest/docs/reference/plotting.
html#bokeh.plotting.figure.Figure.circle
10
Interactive Data
Visualization with Plotly
10.1 Installation
To plot interactive plots with Plotly, you have to first download
the Plotly library using the following script.
Before we plot with Plotly library, let’s first import the required
libraries:
import pandas as pd
import numpy as np
%matplotlib inline
init_notebook_mode(connected=True)
import cufflinks as cf cf.go_offline()
Script 1:
flights_data = sns.load_dataset(‘flights’)
flights_data.head()
Output:
Let’s first plot a very simple line plot using Pandas only. To
do so, you need to select the column for which you want to
plot a static line plot and then call the “plot()” method. The
following script plots plot for the passengers columns of the
flights dataset.
226 | I n t e r a c t i v e D ata V i s u a l i z at i o n with P l ot ly
Script 2:
dataset_filter = flights_data[[«passengers»]]
dataset_filter.plot()
Output:
Now to plot an interactive line plot, you have to call the iplot()
method on the Pandas dataframe as shown below.
Script 3:
dataset_filter.iplot()
Output:
Script 4:
If you hover the mouse below, you will see the actual number
of passengers traveling in a specific month. The output shows
that the maximum number of passengers travel in the months
of July and August, probably due to vacation.
Output:
To plot horizontal bar plots, you have to pass barh as the value
for the kind attribute of the iplot() function. Look at the
following example.
Script 5:
flights_data.iplot(kind=’barh’, x=[‘month’],y= ‘passengers’)
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 229
Script 6:
flights_data.iplot(kind=’scatter’, x= ‘month’, y=
‘passengers’, mode= ‘markers’)
Output:
Let’s plot a scatter plot using the tips dataset. The following
script imports the tips dataset from the Seaborn library.
230 | I n t e r a c t i v e D ata V i s u a l i z at i o n with P l ot ly
Script 7:
tips_data = sns.load_dataset(‘tips’)
tips_data.head()
Output:
Script 8:
The output shows that with the increase in the total bill, the
corresponding tip also increases.
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 231
Output:
Script 9:
tips_data.iplot(kind=’box’)
232 | I n t e r a c t i v e D ata V i s u a l i z at i o n with P l ot ly
Output:
10.6. Histogram
Histograms show the distribution of values in a numeric
column. Let’s plot the histogram for the age column of the
Titanic dataset. To do so, you first need to import the Titanic
dataset using the following script.
Script 10:
Output:
Script 11:
titanic_data[‘Age’].iplot(kind=’hist’,bins=25)
Output:
Exercise 10.1
Question 1
A- plot()
B- iplot()
C- draw()()
D- idraw()
Question 2
A- shape, markers
B- shape, scatter
C- mode, marker
D- mode, scatter
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 235
Question 3
A- histogram()
B- histo()
C- hist()
D- none of the above
Answer: C
Exercise 10.2
Plot an interactive histogram for the PClass column of the
Titanic dataset.
References
1. https://plot.ly/python/v3/ipython-notebooks/cufflinks/#line-
charts
2. https://plot.ly/python/v3/ipython-notebooks/cufflinks/#bar-
charts
3. https://plot.ly/python/v3/ipython notebooks/cufflinks/#scatter-
plot
4. https://plot.ly/python/v3/ipython-notebooks/cufflinks/#box-
plots
5. h t t p s : // p l o t . l y / p y t h o n / v 3 / i p y t h o n - n o t e b o o k s /
cufflinks/#histograms
Hands-on Project
Script 1:
Output:
Script 2:
data_columns = customer_churn.columns.values.tolist()
print(data_columns)
Output:
Script 3:
sns.pairplot(data=customer_churn)
Output:
Script 4:
plt.rcParams[«figure.figsize»] = [10,8]
After pair plot, you are free to choose whichever plot you want
to plot depending upon the task. Let’s see if gender plays any
role in customer churn. You can plot the bar chart for that, as
shown below.
Script 5:
Output:
The output shows that 25 percent of the women left the bank
compared to 15 percent of the men, which means that women
are more likely to leave the bank than men.
Let’s now plot a histogram for the Age column of our dataset.
Script 6:
plt.title(‘Age Histogram’)
plt.hist(customer_churn[«Age»])
Output:
Script 7:
plt.scatter(customer_churn[«Age»], customer_
churn[«EstimatedSalary»], c = ‘g’)
242 | Hands-on Project
Output:
Script 8:
countries = customer_churn[«Geography»].value_counts()
labels = countries.index.values.tolist()
values = countries.values.tolist()
explode = (0.05, 0.05, 0.05)
Output:
Let’s plot a box plot showing the percentile of age for the
passengers who left the bank and for those who didn’t leave
the bank with respect to gender.
Script 9:
sns.boxplot(x=’Exited’, y=’Age’, hue =
‘Gender’,data=customer_churn)
The output shows that the average age of the customers who
left the bank is slightly higher than those who didn’t leave the
bank.
244 | Hands-on Project
Output:
Script 10:
sns.violinplot(x=’Exited’, y=’Age’, hue =
‘Gender’,data=customer_churn)
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 245
Script 11:
corr_values = customer_churn.corr()
sns.heatmap(corr_values, annot= True)
Output:
Script 12:
import pandas as pd
import numpy as np
%matplotlib inline
Output:
Exercise Solutions
§§ Exercise 1.1
Question 1
A- For Loop
B- While Loop
C- Both A and B
D- None of the above
Answer: A
Question 2
A- Single Value
B- Double Value
C- More than two values
D- None
Answer: C
248 | Exercise Solutions
Question 3
A- In
B- Out
C- Not In
D- Both A and C
Answer: D
§§ Exercise 1.2
Print the table of integer for 9 using a while loop.
Solution
j=1
while j< 11:
print(«9 x «+str(j)+ « = «+ str(9*j))
j=j+1
Output:
9 x 1 = 9
9 x 2 = 18
9 x 3 = 27
9 x 4 = 36
9 x 5 = 45
9 x 6 = 54
9 x 7 = 63
9 x 8 = 72
9 x 9 = 81
9 x 10 = 90
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 249
§§ Exercise 2.1
Question 1
A- color
B- c
C- r
D- None of the above
Answer: C
Question 2
A- title
B- label
C- axis
D- All of the above
Answer: B
Question 3
A - autopct = ‘%1.1f%%’
B - percentage = ‘%1.1f%%’
C - perc = ‘%1.1f%%’
D - None of the Above
Answer: A
250 | Exercise Solutions
§§ Exercise 2.2
Create a pie chart that shows the distribution of passengers
with respect to their gender, in the unfortunate Titanic ship.
You can use the Titanic dataset for that purpose.
Solution:
import pandas as pd
data = pd.read_csv(r”E:\Data Visualization with Python\
Datasets\titanic_data.csv”)
data.head()
sex= data[“Sex”].value_counts()
print(sex)
labels = sex.index.values.tolist()
values = sex.values.tolist()
explode = (0.05, 0.05)
Output:
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 251
§§ Exercise 3.1
Question 1
Which plot function will you use to plot a graph in the 5th
cell of a plot multiple plot figure with four rows and two
columns?
A- plt.subplot(5,4,2)
B- plt.subplot(2,4,5)
C- plt.subplot(4,2,5)
D- None of the Above
Answer: C
Question 2
How will you create a subplot with five rows and three
columns using the subplots() function?
A- plt.subplots(nrows=5, ncols=3)
B- plt.subplots(5,3)
C- plt.subplots(rows=5, cols=3)
D- All of the Above
Answer: A
Question 3
A- figure.saveimage()
B- figure.savegraph()
C- figure.saveplot()
D- figure.savefig()
Answer: D
252 | Exercise Solutions
§§ Exercise 3.2
Draw multiple plots with three rows and one column. Show
the sine of any 30 integers in the first plot, the cosine of the
same 30 integers in the second plot, and the tangent of the
same 30 integers in the third plot.
Solution:
plt.rcParams[“figure.figsize”] = [12,8]
plt.subplot(3,1,1)
plt.plot(x_vals, y1_vals, ‘bo-’)
plt.subplot(3,1,2)
plt.plot(x_vals, y2_vals, ‘rx-’)
plt.subplot(3,1,3)
plt.plot(x_vals, y3_vals, ‘g*-’)
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 253
Output:
§§ Exercise 4.1
Question 1
Which plot is used to plot multiple joint plots for all the
combinations of numeric and Boolean columns in a dataset?
A- Joint Plot
B- Pair Plot
C- Dist Plot
D- Scatter Plot
Answer: B
254 | Exercise Solutions
Question 2
A- barplot()
B- jointplot()
C- catplot()
D- mulplot()
Answer: C
Question 3
A- kind
B- type
C- hue
D- col
Answer: A
§§ Exercise 4.2
Plot a swarm violin plot using the Titanic data that
displays the fare paid by male and female passengers.
Further, categorize the plot by passengers who survived and
by those who didn’t.
Solution:
sns.swarmplot(x=’sex’, y=’fare’,
hue=’survived’,data=titanic_data, split = True)
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 255
Output:
§§ Exercise 5.1
Question 1
A- set_style (‘darkgrid’)
B- set_style (‘whitegrid’)
C- set_style (‘poster’)
D- set_context (‘poster’)
Answer: D
256 | Exercise Solutions
Question 2
A- correlation()
B- corr()
C- heatmap()
D- none of the above
Answer: B
Question 3
A- annotate()
B- annot()
C- mark()
D- display()
Answer: B
§§ Exercise 5.2
Plot two scatter plots on the same graph using the tips_
dataset. In the first scatter plot, display values from the
total_bill column on the x-axis and from the tip column on the
y-axis. The color of the first scatter plot should be green. In the
second scatter plot, display values from the total_bill column
on the x-axis and from the size column on the y-axis. The color
of the second scatter plot should be blue, and the markers
should be x.
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 257
Solution:
Output:
258 | Exercise Solutions
§§ Exercise 6.1
Question 1
A- 0
B- 1
C- 2
D- None of the above
Answer: B
Question 2
A- sort_dataframe()
B- sort_rows()
C- sort_values()
D- sort_records()
Answer: C
Question 3
A- filter()
B- filter_columns()
C- apply_filter ()
D- None of the above()
Answer: A
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 259
§§ Exercise 6.2
Use the apply function to subtract 10 from the Fare column of
the Titanic dataset, without using lambda expression.
Solution:
def subt(x):
return x - 10
updated_class = titanic_data.Fare.apply(subt)
updated_class.head()
Output:
0 2.2500
1 66.2833
2 2.9250
3 48.1000
4 3.0500
Name: Fare, dtype: float64
§§ Exercise 7.1
Question 1
A- set_color()
B- define_color()
C- color()
D- None of the above
Answer: C
260 | Exercise Solutions
Question 2
A- horz_bar()
B- barh()
C- bar_horizontal()
D- horizontal_bar()
Answer: B
Question 3
A - shift_back(5)
B - shift(5)
c - shift_behind(-5)
D - shift(-5)
Answer: D
§§ Exercise 7.2
Display a bar plot using the Titanic dataset that displays the
average age of the passengers who survived vs. those who
did not survive.
Solution:
Output:
262 | Exercise Solutions
§§ Exercise 8.1
Plot a scatter plot that shows the distribution of pclass, age,
and fare columns from the Titanic dataset.
Solution:
plt.rcParams[«figure.figsize»] = [8,6]
sns.set_style(«darkgrid»)
titanic_data = sns.load_dataset(‘titanic’)
pclass = titanic_data[‘pclass’].tolist()
age = titanic_data[‘age’].tolist()
fare = titanic_data[‘fare’].tolist()
figure4 = plt.figure()
axis4 = figure4.add_subplot( projection=’3d’)
axis4.scatter(bill,tip,size)
axis4.set_xlabel(‘pclass’)
axis4.set_ylabel(‘age’)
axis4.set_zlabel(‘fare’)
plt.show()
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 263
Output:
§§ Exercise 9.1
Question 1
A- figure()
B- width()
C- height()
D- None of the above
Answer: A
264 | Exercise Solutions
Question 2
A- line
B- width
C- line_width
D- length
Answer: C
Question 3
In the Bokeh library, the list of values used to plot bar plots is
passed to the following attribute of the bar plot:
A- values
B- legends
C- y
D- top
Answer: D
§§ Exercise 9.2
Plot a bar plot using the Titanic dataset that displays the
average age of both male and female passengers.
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 265
Solution:
sex_mean = titanic_data.groupby(“Sex”)[“Age”].mean()
plotx = figure(
x_range = sex_mean.index.tolist(),
title = ‘Sex vs Age’,
x_axis_label =’Sex’,
y_axis_label =’Age’,
plot_height=400
)
Output:
266 | Exercise Solutions
§§ Exercise 10.1
Question 1
A- plot()
B- iplot()
C- draw()()
D- idraw()
Answer: B
Question 2
A- shape, markers
B- shape, scatter
C- mode, marker
D- mode, scatter
Answer: C
Question 3
A- histogram()
B- histo()
C- hist()
D- none of the above
Answer: C
D ata V i s u a l i z at i o n W i t h P y t h o n F o r B e g i n n e r s | 267
§§ Exercise 10.2
Plot an interactive histogram for the PClass column of the
Titanic dataset.
Solution:
titanic_data[‘Pclass’].iplot(kind=’hist’)
Output: