Data Visualization using Python

August 17, 2021

1 Introduction
Data Visualisation refers to the graphical or visual representation of information and data us-
ing visual elements like charts, graphs, maps, etc.


It is a 2D plotting Library which produces publication quality figures. PyPlot is a module Of mat-
plotlib library containing collection of methods which allow users to create 2D plots and graphs
easily and interactively.
[1]: from matplotlib import pyplot as plt #Import Statement

1.0.2 PLOT
A Plot is graphical technique for representing a data set, usually as a graph showing the relation-
ship between two or more variables. Lets look at an example
[3]: #Step 1: Import the Module
from matplotlib import pyplot as plt

#Step 2: Create the Lists

x = [0,1,2,3,4,5]
y = [0,2,4,6,8,10]

#Step 3: Not Required

#Step 4: Plotting the Graph --- For Line graph, the method we use is known as␣


#Step 5: Detailing the Plot

#Detail 1: Name of X Axis --> xlabel()
plt.xlabel('X Axis')
#Detail 2: Name of Y Axis --> ylabel()
plt.ylabel("Y Axis")
#Detail 3: Name of the Graph --> title()
plt.title('First Line Graph')

#Step 6: Saving a Graph/Plot
plt.savefig('Line1.png') # png, jpeg, pdf, svg

#Step 7: Display the Plot.



• PIE CHART (Not In Syllabus)
• SCATTER PLOT (Not in Syllabus)
• FREQUENCY POLYGON (Not in Syllabus)
• BOX PLOT ETC. (Not in Syllabus)

1.1 TIP: General Steps to be followed for Plotting any Graph

1. Import the necessary modules ( Ex. matplotlib.pyplot and numpy)
2. Create the Arrays/Lists to be plotted into a graph
3. Plot the Graph using the proper lists and mention the details (Ex. color, width, align, legend

4. Provide the necessary Details for the Graph (Ex. Title, XLabel, YLabel, XTicks, YTicks, Show
Legend, etc)
5. [Optional - When Required] Save the Plot
6. Display the Plot

1.1.1 Line Graph

It is used to visualise data which has some kind of sequence. Example: How is Distance changing
with time Example: How many animals in forest residing against temperature of place.
SYNTAX: plt.plot(data_x, data_y)
[49]: #Step 1: Import the Modules
from matplotlib import pyplot as plt

#Step 2: Create the Lists/Arrays

d = [0,5,2,7,3,4,5,2]
t = [0,1,2,3,4,5,6,7]

#Step 3: Plot the Graph

plt.plot(t,d, linestyle = '-',marker = 'o')

#Step 4: Provide the Details

plt.title("Distance vs Time")

#Step 5: Save the Plot


#Step 6: Display the Plot


[59]: #Step 1: Import the Modules
from matplotlib import pyplot as plt

#Step 2: Create the Arrays

year = ['2017 - 18','2018 - 19','2019 - 20','2020 - 21']
kvp = [83.4,89.7,88.7,91.2]
jnv = [87.3,88.3,82.5,90.2]
hcs = [90.2,89.0,83.7,93.5]

#Step 3: Plot The Graphs

plt.plot(year, kvp, marker = 'o', label = 'KVP')
plt.plot(year, jnv, marker = '*', label = 'JNV')
plt.plot(year, hcs, marker = '^', label = 'HCS')

#Step 4: Provide the Details

plt.title("Result Analysis")

#Step 5: Save the Graph.


#Step 6: Display the Graph


1.2 Bar Graph

[2]: #Step 1: Import the Module

from matplotlib import pyplot as plt

#Step 2: Create the Arrays

players = ['Dhoni','Virat','Shikhar','Rishabh']
runs = [76,102,48,27]

#Step 3: Plot the Graph


#Step 4:
plt.title("Player Runs")

#Step 5


#Assignment: Create the following Bar Graph
[2]: from matplotlib import pyplot as plt

seasons = ['Summer','Monsoon','Autumn','Winter','Spring']
ice_cream = [100,80,70,45,85]

plt.bar(seasons,ice_cream, linewidth = 2, edgecolor = 'black')

plt.title('Ice-Cream Per Season')

plt.ylabel('Litres of Ice-cream')


[1]: from matplotlib import pyplot as plt


[17]: from matplotlib import pyplot as plt

month = ['Jan','Feb','Mar','Apr','May','Jun']
Year2021 = [10,12,15,25,30,32]
Year2020 = [18,10,20,25,35,40]
plt.bar(month,Year2020, width = -0.4, align = 'edge', label = '2020') #Negetive␣
,→width shifts the graph to the left

plt.bar(month,Year2021, width = 0.4, align = 'edge', label = '2021') #Positive␣

,→width shifts the graph to the right

plt.title("Electricity Comparison 2020 vs 2021")



This graph is usually used to display the frequency of each item in the data. What is required for
such a representation is buckets/bins of the range (10-20,20-30,30-40. . . ) Bins can be mentioned in
either of two ways - A list [10,20,30,40,50,. . . ] - An integer depicting the no of bins required. The Bins
will then be generated by equally distributing the total range of the frequency data.
By Default the Bin has a integer value of 10. #### There are 2 techniques for getting Data for
Histogram. 1. Use the actual Frequency data as a list/array of values and use that to plot the
histogram. 2. Separate the Data and the Frequency of Occurance into 2 Lists and use both to plot
the histogram. Both techniques can be used depending on the requirement of the question.


• When actual frequency data is used:- SYNTAX: plt.hist(data,bins)

• When Data and Frequency of Occurance are in different variables:- SYNTAX:

Histogram with Actual Data and Integer Bins

[4]: from matplotlib import pyplot as plt
import numpy as np

data = np.array([24,23,24,23,21,25,24,21,11,16,15,30,
plt.hist(data, bins = 5, edgecolor = 'black')


Histogram with actual data and bins as list

[10]: from matplotlib import pyplot as plt
import numpy as np

data = np.array([24,23,24,23,21,25,24,21,11,16,15,30,
plt.hist(data, bins = [10,15,20,25,30,35,40],edgecolor = 'black')


Histogram with Frequency Groups and Bins as Lists
[9]: from matplotlib import pyplot as plt

mark_group=[15,25,35,45] # Class Marks - (lower point + upper point) /2

Frequency = [5,10,20,5]

plt.hist(mark_group, bins = [10,20,30,40,50], weights = Frequency,edgecolor=␣



[11]: # Age 16, 17, 18, 19
from matplotlib import pyplot as plt

Age = [16,17,18,19]
Freq = [3, 18, 10, 4]

plt.hist(Age, bins = [15,16,17,18,19,20], weights = Freq, edgecolor = 'black')

plt.title('Age Frequency')
plt.xlabel('Age group')
plt.ylabel('No of Students')

1. Plot a line graph to display growth in population in the past 7 decades. Use the following
Table Data for this purpose:-

Census Year Population

1951 361,088,000
1961 439,235,000
1971 548,160,000
1981 683,329,000
1991 846,387,888
2001 1,028,737,436
2011 1,210,726,932

2. Plot a line graph to show Sin Curve. (HOTS) Hint: Numpy has a function, numpy.sin() to find
the sin values.

3. Plot a line graph to show Cos Curve. (HOTS) Hint: Numpy has a function, numpy.cos() to find
the cos values.

4. Plot a Bar Graph to show the number of boys in each class 6- 12. Data should be imagined
by student.

5. Plot a Bar Graph for Marks scored in different subjects. Data should be imagined.

6. Plot a Histogram to find the number of employees coming to office between 7am to 12noon.
Use bins as 1 hr gaps.


