01-Matplotlib
01-Matplotlib
01-Matplotlib
Introduction
Matplotlib is the "grandfather" library of data visualization with Python. It was created by John Hunter. He created it to try to
replicate MatLab's (another programming language) plotting capabilities in Python. So if you happen to be familiar with matlab,
matplotlib will feel natural to you.
It is an excellent 2D and 3D graphics library for generating scientific figures.
Some of the major Pros of Matplotlib are꞉
Generally easy to get started for simple plots
Support for custom labels and texts
Great control of every element in a figure
High‑quality output in many formats
Very customizable in general
Matplotlib allows you to create reproducible figures programmatically. Let's learn how to use it! Before continuing this lecture, I
encourage you just to explore the official Matplotlib web page꞉ http꞉//matplotlib.org/
Installation
You'll need to install matplotlib first with either꞉
conda install matplotlib
Import the matplotlib.pyplot module under the name plt (the tidy way)꞉
In [1]: import matplotlib.pyplot as plt
You'll also need to use this line to see plots in the notebook꞉
In [2]: %matplotlib inline
That line is only for jupyter notebooks, if you are using another editor, you'll use꞉ plt.show() at the end of all your plotting
commands to have the figure pop up in another window.
Basic Example
Let's walk through a very simple example using two numpy arrays꞉
The data we want to plot꞉
In [4]: year = [2018,2019,2020,2021,2022,2023]
sales = [290, 300, 350, 360, 400, 416]
In [6]: # Example
plt.plot(year, sales) # default graph is line graph
plt.title('Year Wise Sales') # for setting the title of the plot.
plt.xlabel('Year') # plot.xlabel , plt.ylabel for labeling x and y‑axis respectively.
plt.ylabel('Sales')
plt.grid() # for adding gridlines
plt.show() # for displaying the plot.
In [8]: # Example
plt.plot(year, sales,marker = 'o', linestyle = 'dashdot',color="blue",markersize=10)
plt.title('Yearly Sales')
plt.xlabel('Year')
plt.ylabel('Sales')
plt.show()
In [10]: df.head()
In [11]: # plot graph between months and monthly ticket sales in the comedy genre
plt.plot(df["Month"],df["Comedy"])
plt.show()
In [13]: # Example
plt.plot(df["Month"],df["Comedy"],color="green",marker="o")
plt.plot(df["Month"],df["Thriller"],color="blue",linestyle="dashed",marker="D")
plt.title("Comedy vs Thriller tickets sold Comparision")
plt.xlabel("Months")
plt.ylabel("Tickets sold")
plt.show()
xlim ꞉The xlim() function in pyplot module of matplotlib library is used to get or set the x‑limits of the current axes.
ylim ꞉The ylim() function in pyplot module of matplotlib library is used to get or set the y‑limits of the current axes.
In [15]: plt.plot(df["Month"],df["Comedy"])
plt.xlim("jan","jun") # it will show the data from january to june
plt.show()
In [16]: plt.plot(df["Month"],df["Thriller"])
plt.ylim(0,35000) # it will set the y‑axis from zero tp 35000
plt.show()
plt.figure()
We can change the size of the plot above using the figsize() attribute of the figure() function.
The figsize() attribute takes in two parameters — one for the width and the other for the height.
synatx꞉
figure(figsize=(WIDTH_SIZE,HEIGHT_SIZE))
limiting axes
In [18]: # here value in 2021 is very big number
price = [48000,54000,57000,49000,47000,45000,4500000]
year = [2015,2016,2017,2018,2019,2020,2021]
plt.plot(year,price)
plt.show()
We can observe that there is a flat line from 2015 to 2020 due to one large number.
So we can set the limit. by xlim() and ylim()
In [19]: price = [48000,54000,57000,49000,47000,45000,4500000]
year = [2015,2016,2017,2018,2019,2020,2021]
plt.plot(year,price)
plt.ylim(0,75000)
plt.show()
2) Scatter Plots
Scatter plots are the graphs that present the relationship between two variables in a data‑set. It represents data points on a two‑
dimensional plane or on a Cartesian system. The independent variable or attribute is plotted on the X‑axis, while the dependent
variable is plotted on the Y‑axis.
for Bivariate analysis꞉‑ Bivariate analysis is an analysis of two variables to determine the relationships between them
numerical to numerical columns
Use case ‑ find out the correlation between numerical columns
plt.scatter()
In [20]: year_of_experience = [10,12,3,4,5,10,5,6]
annual_salary = [100000,120000, 30000, 40000, 50000, 90000,70000,10000]
plt.scatter(year_of_experience,annual_salary,color='red',marker='+')
plt.show()
In [22]: tips.head()
plt.text()
Syntax꞉ plt.text(x, y, s, fontdict=None, **kwargs)
Add text to the Axes.
Add the text s to the Axes at location x, y in data coordinates.
In [24]: # Example
iq = [90,110,120,140]
percentage = [40, 60, 70, 80]
plt.scatter(iq,percentage)
plt.text(90,40,"Rahul") # pass coordinates and text
plt.text(110,60,"Raj")
plt.text(120,70,"Kiran")
plt.text(140,80,"Dhoni")
plt.show()
3. Bar chart
A bar chart is used when you want to show a distribution of data points or perform a comparison of metric values across different
subgroups of your data. From a bar chart, we can see which groups are highest or most common, and how other groups compare
against the others.c values.
In a bar chart, we have one axis representing a particular category of the columns and another axis representing the values or count
of the specific category.
Examples꞉
- Total sales by product category
- population by country
- Revenue by department
plt.bar()
In [26]: # create a bar chart
plt.bar(x=players,height=runs)
plt.show()
You can observe that in below bar graph x‑axis ticks are overlaped
In [27]: # Example
name = ["sachin verma","shubham patel","rahul patidar","Devendra patidar","Lalit mandloi"]
sales = [15000, 12000, 20000, 13000,14000]
plt.bar(name,sales)
plt.show()
plt.xticks() ꞉‑ xticks() function is used to get or set the current tick locations and labels of the x‑axis.
In [28]: # Example
name = ["sachin verma","shubham patel","rahul patidar","Devendra patidar","Lalit mandloi"]
sales = [15000, 12000, 20000, 13000,14000]
plt.bar(name,sales)
plt.xticks(rotation="vertical") # use rotation parameter to set the labels vertical or horizontal
plt.show()
horizontal bar
A horizontal bar chart is a great option for long category names, because there is more space on the left‑hand side of the chart
for axis labels to be placed and horizontally oriented.
A horizontal bar chart would be a better choice if the text on the x‑axis of a vertical bar chart would have to be diagonal (or worse,
cut off) to fit.
plt.barh()
In [30]: plt.barh(players,runs,color="green")
plt.show()
4) Histogram
A histogram is the graphical representation of data where data is grouped into continuous number ranges and each range
corresponds to a vertical bar.
The horizontal axis displays the number range.
The vertical axis (frequency) represents the amount of data that is present in each range.
Use case ‑ Showing the distribution of continuous data set.
Example ‑
- Frequency of test scores among students
- Distribution of population by age group
- Distribution of heights or weights
In [31]: tips.head()
plt.hist(tips["tip"])
plt.show()
In [34]: # Example
vk_runs = pd.read_csv("virat_kohli_ipl_match_runs.csv")
vk_runs.head()
5) Pie Chart
A pie chart is a type of graph in which a circle is divided into sectors that each represents a proportion of the whole(100%).
Examples꞉
Percentage of budget spent by department
Gender distribution
Favorite Type of Movie
In [36]: # no of movies according to genre
genre = ["Comedy","Action","Romance","Drama","SciFi"]
no_of_movies = [1000,1500,1200,900,800]
plt.pie(no_of_movies,labels=genre)
plt.show()
In [38]: # Example
# set shadow parameter is True to show the shadow
plt.pie(no_of_movies,labels=genre,autopct='%0.1f%%',shadow=True)
plt.show()
6)Heatmap
A heatmap is a two‑dimensional graphical representation of data where the individual values that are contained in a matrix are
represented as colours.
In [40]: flights = sns.load_dataset("flights")
Out[40]: year 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960
month
Jan 112 115 145 171 196 204 242 284 315 340 360 417
Feb 118 126 150 180 196 188 233 277 301 318 342 391
Mar 132 141 178 193 236 235 267 317 356 362 406 419
Apr 129 135 163 181 235 227 269 313 348 348 396 461
May 121 125 172 183 229 234 270 318 355 363 420 472
we can make heatmap with the function imshow()
In [41]: plt.figure(figsize=(10,10)) # create figure
plt.imshow(flights)
plt.title("Matplotlib Heatmap with imshow")
plt.show()
years = flights.columns.values
In [43]: plt.figure()
plt.imshow(flights)
plt.title("Matplotlib Heatmap with imshow")
plt.xticks(np.arange(len(years)),labels=years,rotation="vertical")
plt.yticks(np.arange(len(months)),labels=months)
plt.tight_layout()
plt.colorbar() # Add a colorbar to a plot.
plt.show()
In [45]: plt.boxplot(tips["total_bill"])
plt.show()
Graph styles
Matplotlib has a convenient option to add a preset to your plots for improvising the classic matplotlib plots. We can choose from a
range of options of stylesheets available in matplotlib. These options can be accessed by executing the following꞉
plt.style.available
This gives a list of all the available stylesheet option names that can be used as an attribute inside
plt.style.use()
['Solarize_Light2',
Out[46]:
'_classic_test_patch',
'_mpl‑gallery',
'_mpl‑gallery‑nogrid',
'bmh',
'classic',
'dark_background',
'fast',
'fivethirtyeight',
'ggplot',
'grayscale',
'seaborn‑v0_8',
'seaborn‑v0_8‑bright',
'seaborn‑v0_8‑colorblind',
'seaborn‑v0_8‑dark',
'seaborn‑v0_8‑dark‑palette',
'seaborn‑v0_8‑darkgrid',
'seaborn‑v0_8‑deep',
'seaborn‑v0_8‑muted',
'seaborn‑v0_8‑notebook',
'seaborn‑v0_8‑paper',
'seaborn‑v0_8‑pastel',
'seaborn‑v0_8‑poster',
'seaborn‑v0_8‑talk',
'seaborn‑v0_8‑ticks',
'seaborn‑v0_8‑white',
'seaborn‑v0_8‑whitegrid',
'tableau‑colorblind10']
In [47]: # data
year = [2018,2019,2020,2021,2022,2023]
sales = [290, 300, 350, 360, 400, 416]
In [49]: # Example
plt.style.use('ggplot')
plt.bar(x=year,height=sales)
plt.show()
Save figure
we can save the figures.
plt.savefig()
In [50]: # Example
plt.style.use('ggplot')
plt.bar(x=year,height=sales)
plt.savefig("bargraph.png")
plt.show()
Great Job!