01-Matplotlib

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Matplotlib

Introduction
Matplotlib is the "grandfather" library of data visualization with Python. It was created by John Hunter. He created it to try to
replicate MatLab's (another programming language) plotting capabilities in Python. So if you happen to be familiar with matlab,
matplotlib will feel natural to you.
It is an excellent 2D and 3D graphics library for generating scientific figures.
Some of the major Pros of Matplotlib are꞉
Generally easy to get started for simple plots
Support for custom labels and texts
Great control of every element in a figure
High‑quality output in many formats
Very customizable in general
Matplotlib allows you to create reproducible figures programmatically. Let's learn how to use it! Before continuing this lecture, I
encourage you just to explore the official Matplotlib web page꞉ http꞉//matplotlib.org/

Installation
You'll need to install matplotlib first with either꞉
conda install matplotlib

or pip install matplotlib


Importing
In [ ]:

Import the matplotlib.pyplot module under the name plt (the tidy way)꞉
In [1]: import matplotlib.pyplot as plt

You'll also need to use this line to see plots in the notebook꞉
In [2]: %matplotlib inline

That line is only for jupyter notebooks, if you are using another editor, you'll use꞉ plt.show() at the end of all your plotting
commands to have the figure pop up in another window.

Matplotlib Graph Structure


figure꞉‑ the outermost container in a Matplotlib graph/chart is called a
axes꞉‑ Each figure can contain one or more axes which are real plots.
Each of the axes will have further sub‑components like axis (x & y), title, legend, axis labels, major & minor ticks, etc.

In [3]: # import required libraries


import numpy as np
import pandas as pd
import seaborn as sns

Basic Example
Let's walk through a very simple example using two numpy arrays꞉
The data we want to plot꞉
In [4]: year = [2018,2019,2020,2021,2022,2023]
sales = [290, 300, 350, 360, 400, 416]

Basic Matplotlib Commands


We can create a very simple line plot using the following ( I encourage you to pause and use Shift+Tab along the way to check out
the document strings for the functions we are using).
The pyplot module provide the plot() function which is frequently use to plot a graph.
1) Line plot
A line chart is used to show the change in information over time. The horizontal axis is usually a time scale; for example, minutes,
hours, days, months, or years.
plt. Use the .plot() method and provide data to create a plot. Then, use the .show() method to display the plot.

In [5]: # to plot x versus y


plt.plot(year,sales) # default graph is line graph
plt.show() # used to display the figure

Add chart elements


plt.title(): Sets the title of the chart, which is passed as an argument.
plt.xlabel(): Sets the label of the X axis.

plt.ylabel(): Sets the label of the Y axis.

plt.grid(): for adding gridlines.

plt.show(): for displaying the plot.

In [6]: # Example
plt.plot(year, sales) # default graph is line graph
plt.title('Year Wise Sales') # for setting the title of the plot.
plt.xlabel('Year') # plot.xlabel , plt.ylabel for labeling x and y‑axis respectively.
plt.ylabel('Sales')
plt.grid() # for adding gridlines
plt.show() # for displaying the plot.

Formatting the style of the plot


You can use the keyword argument marker to emphasize each point with a specified marker
You can use the keyword argument linestyle , or shorter ls , to change the style of the plotted line
you can use the grid() function to add grid lines to the plot.
You can use the keyword argument color to change the color of line
In [7]: plt.plot(year, sales,marker = '*', linestyle = 'dotted',color="green")
plt.title('Yearly sales')
plt.xlabel('Year')
plt.ylabel('Sales')
plt.show()

In [8]: # Example
plt.plot(year, sales,marker = 'o', linestyle = 'dashdot',color="blue",markersize=10)
plt.title('Yearly Sales')
plt.xlabel('Year')
plt.ylabel('Sales')
plt.show()

For more details follow this link


https꞉//matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html#matplotlib.pyplot.plot
plot from pandas dataframe
In [9]: df = pd.read_csv("movie_ticket_sold.csv")

In [10]: df.head()

Out[10]: Month Comedy Thriller


0 jan 49832 12839
1 feb 47232 16828
2 mar 40002 15839
3 apr 37283 18082
4 may 32910 24932

In [11]: # plot graph between months and monthly ticket sales in the comedy genre
plt.plot(df["Month"],df["Comedy"])
plt.show()

In [12]: # ploting multiple plots


plt.plot(df["Month"],df["Comedy"])
plt.plot(df["Month"],df["Thriller"])
plt.title("Comedy vs Thriller ticket sales Comparision")
plt.xlabel("Months")
plt.ylabel("Monthly Ticket sales")
plt.show()

In [13]: # Example
plt.plot(df["Month"],df["Comedy"],color="green",marker="o")
plt.plot(df["Month"],df["Thriller"],color="blue",linestyle="dashed",marker="D")
plt.title("Comedy vs Thriller tickets sold Comparision")
plt.xlabel("Months")
plt.ylabel("Tickets sold")
plt.show()

plt.legend(): Displays the legend on the plot.


add another parameter "label." This allows us to assign a name to the line, which we can later show in the legend.
In [14]: # Add legend
# Example
plt.plot(df["Month"],df["Comedy"],label="Comedy") # add label parameter to insert legend
plt.plot(df["Month"],df["Thriller"],label="Google")
plt.title("Comedy vs Thriller tickets sold Comparision")
plt.xlabel("Months")
plt.ylabel("Tickets sold")
plt.legend() # to show the legend﴾label﴿
plt.show()

xlim ꞉The xlim() function in pyplot module of matplotlib library is used to get or set the x‑limits of the current axes.
ylim ꞉The ylim() function in pyplot module of matplotlib library is used to get or set the y‑limits of the current axes.

In [15]: plt.plot(df["Month"],df["Comedy"])
plt.xlim("jan","jun") # it will show the data from january to june
plt.show()

In [16]: plt.plot(df["Month"],df["Thriller"])
plt.ylim(0,35000) # it will set the y‑axis from zero tp 35000
plt.show()

plt.figure()
We can change the size of the plot above using the figsize() attribute of the figure() function.
The figsize() attribute takes in two parameters — one for the width and the other for the height.
synatx꞉
figure(figsize=(WIDTH_SIZE,HEIGHT_SIZE))

In [17]: plt.figure(figsize=(10,3)) # set the size of figure


plt.plot(df["Month"],df["Comedy"])
plt.show()

limiting axes
In [18]: # here value in 2021 is very big number
price = [48000,54000,57000,49000,47000,45000,4500000]
year = [2015,2016,2017,2018,2019,2020,2021]

plt.plot(year,price)
plt.show()

We can observe that there is a flat line from 2015 to 2020 due to one large number.
So we can set the limit. by xlim() and ylim()
In [19]: price = [48000,54000,57000,49000,47000,45000,4500000]
year = [2015,2016,2017,2018,2019,2020,2021]
plt.plot(year,price)

plt.ylim(0,75000)
plt.show()

2) Scatter Plots
Scatter plots are the graphs that present the relationship between two variables in a data‑set. It represents data points on a two‑
dimensional plane or on a Cartesian system. The independent variable or attribute is plotted on the X‑axis, while the dependent
variable is plotted on the Y‑axis.
for Bivariate analysis꞉‑ Bivariate analysis is an analysis of two variables to determine the relationships between them
numerical to numerical columns
Use case ‑ find out the correlation between numerical columns
plt.scatter()
In [20]: year_of_experience = [10,12,3,4,5,10,5,6]
annual_salary = [100000,120000, 30000, 40000, 50000, 90000,70000,10000]
plt.scatter(year_of_experience,annual_salary,color='red',marker='+')
plt.show()

In [21]: tips = sns.load_dataset("tips") # loading a tip dataset from seaborn

In [22]: tips.head()

Out[22]: total_bill tip sex smoker day time size


0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

In [23]: # plot scatter graph between total_bill and tip


plt.scatter(tips["total_bill"],tips["tip"])
plt.xlabel("Total bill")
plt.ylabel("Tips")
plt.title("Relation between total bill and tip")
plt.show()

plt.text()
Syntax꞉ plt.text(x, y, s, fontdict=None, **kwargs)
Add text to the Axes.
Add the text s to the Axes at location x, y in data coordinates.
In [24]: # Example

iq = [90,110,120,140]
percentage = [40, 60, 70, 80]
plt.scatter(iq,percentage)
plt.text(90,40,"Rahul") # pass coordinates and text
plt.text(110,60,"Raj")
plt.text(120,70,"Kiran")
plt.text(140,80,"Dhoni")
plt.show()

3. Bar chart
A bar chart is used when you want to show a distribution of data points or perform a comparison of metric values across different
subgroups of your data. From a bar chart, we can see which groups are highest or most common, and how other groups compare
against the others.c values.
In a bar chart, we have one axis representing a particular category of the columns and another axis representing the values or count
of the specific category.
Examples꞉
- Total sales by product category

- population by country

- Revenue by department

In [25]: players = ["Dhoni","Kohli", "Sachin", "Jadeja", "KL"]


runs = [2000,3000, 4000, 2500, 2000]

plt.bar()
In [26]: # create a bar chart

plt.bar(x=players,height=runs)
plt.show()

You can observe that in below bar graph x‑axis ticks are overlaped
In [27]: # Example
name = ["sachin verma","shubham patel","rahul patidar","Devendra patidar","Lalit mandloi"]
sales = [15000, 12000, 20000, 13000,14000]
plt.bar(name,sales)
plt.show()

plt.xticks() ꞉‑ xticks() function is used to get or set the current tick locations and labels of the x‑axis.
In [28]: # Example
name = ["sachin verma","shubham patel","rahul patidar","Devendra patidar","Lalit mandloi"]
sales = [15000, 12000, 20000, 13000,14000]
plt.bar(name,sales)
plt.xticks(rotation="vertical") # use rotation parameter to set the labels vertical or horizontal
plt.show()

In [29]: # we also can change ticks


name = ["sachin verma","shubham patel","rahul patidar","Devendra patidar","Lalit mandloi"]
sales = [15000, 12000, 20000, 13000,14000]
plt.bar(name,sales)
plt.xticks(name,["a","b","c","d","e"])
plt.show()

horizontal bar
A horizontal bar chart is a great option for long category names, because there is more space on the left‑hand side of the chart
for axis labels to be placed and horizontally oriented.
A horizontal bar chart would be a better choice if the text on the x‑axis of a vertical bar chart would have to be diagonal (or worse,
cut off) to fit.
plt.barh()
In [30]: plt.barh(players,runs,color="green")
plt.show()

4) Histogram
A histogram is the graphical representation of data where data is grouped into continuous number ranges and each range
corresponds to a vertical bar.
The horizontal axis displays the number range.
The vertical axis (frequency) represents the amount of data that is present in each range.
Use case ‑ Showing the distribution of continuous data set.
Example ‑
- Frequency of test scores among students
- Distribution of population by age group
- Distribution of heights or weights

In [31]: tips.head()

Out[31]: total_bill tip sex smoker day time size


0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
plt.hist()
In [32]: # use tips dataset

plt.hist(tips["tip"])
plt.show()

In [33]: plt.hist(tips["tip"],bins=[2,5,7,9]) # we can set binsize


plt.show()

In [34]: # Example
vk_runs = pd.read_csv("virat_kohli_ipl_match_runs.csv")
vk_runs.head()

Out[34]: match_id batsman_runs


0 12 62
1 17 28
2 20 64
3 27 0
4 30 10

In [35]: # plot histogram of virat kohli's runs in ipl matches


plt.hist(vk_runs["batsman_runs"])
plt.show()

5) Pie Chart
A pie chart is a type of graph in which a circle is divided into sectors that each represents a proportion of the whole(100%).
Examples꞉
Percentage of budget spent by department
Gender distribution
Favorite Type of Movie
In [36]: # no of movies according to genre
genre = ["Comedy","Action","Romance","Drama","SciFi"]
no_of_movies = [1000,1500,1200,900,800]
plt.pie(no_of_movies,labels=genre)
plt.show()

In [37]: # Use this command ﴾autopct='%0.1f%%'﴿ to display data in percentages


plt.pie(no_of_movies,labels=genre,autopct='%0.1f%%')
plt.show()

In [38]: # Example
# set shadow parameter is True to show the shadow
plt.pie(no_of_movies,labels=genre,autopct='%0.1f%%',shadow=True)
plt.show()

In [39]: # set colors according to you


# use colors parameter
# here we will use hexadecimal codes of color
plt.pie(no_of_movies,labels=genre,autopct='%0.1f%%',colors = ["#ac92eb", "#4fc1e8", "#a0d568", "#ffce54","#ed55
plt.show()

6)Heatmap
A heatmap is a two‑dimensional graphical representation of data where the individual values that are contained in a matrix are
represented as colours.
In [40]: flights = sns.load_dataset("flights")

# pivoting to make the data wide


flights = flights.pivot(index="month", columns="year", values="passengers")
flights.head()

Out[40]: year 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960
month
Jan 112 115 145 171 196 204 242 284 315 340 360 417
Feb 118 126 150 180 196 188 233 277 301 318 342 391
Mar 132 141 178 193 236 235 267 317 356 362 406 419
Apr 129 135 163 181 235 227 269 313 348 348 396 461
May 121 125 172 183 229 234 270 318 355 363 420 472
we can make heatmap with the function imshow()
In [41]: plt.figure(figsize=(10,10)) # create figure
plt.imshow(flights)
plt.title("Matplotlib Heatmap with imshow")
plt.show()

In [42]: months = flights.index.values

years = flights.columns.values

In [43]: plt.figure()
plt.imshow(flights)
plt.title("Matplotlib Heatmap with imshow")
plt.xticks(np.arange(len(years)),labels=years,rotation="vertical")
plt.yticks(np.arange(len(months)),labels=months)
plt.tight_layout()
plt.colorbar() # Add a colorbar to a plot.
plt.show()

Box Plot / Whisker plot


Box plots are used to show distributions of numeric data values
A box and whisker plot—also called a box plot—displays the five‑number summary of a set of data. The five‑number summary is
the minimum, first quartile, median, third quartile, and maximum. In a box plot, we draw a box from the first quartile to the third
quartile.

In [44]: tips = sns.load_dataset("tips")

In [45]: plt.boxplot(tips["total_bill"])
plt.show()

Graph styles
Matplotlib has a convenient option to add a preset to your plots for improvising the classic matplotlib plots. We can choose from a
range of options of stylesheets available in matplotlib. These options can be accessed by executing the following꞉
plt.style.available

This gives a list of all the available stylesheet option names that can be used as an attribute inside
plt.style.use()

In [46]: # available styles


plt.style.available

['Solarize_Light2',
Out[46]:
'_classic_test_patch',
'_mpl‑gallery',
'_mpl‑gallery‑nogrid',
'bmh',
'classic',
'dark_background',
'fast',
'fivethirtyeight',
'ggplot',
'grayscale',
'seaborn‑v0_8',
'seaborn‑v0_8‑bright',
'seaborn‑v0_8‑colorblind',
'seaborn‑v0_8‑dark',
'seaborn‑v0_8‑dark‑palette',
'seaborn‑v0_8‑darkgrid',
'seaborn‑v0_8‑deep',
'seaborn‑v0_8‑muted',
'seaborn‑v0_8‑notebook',
'seaborn‑v0_8‑paper',
'seaborn‑v0_8‑pastel',
'seaborn‑v0_8‑poster',
'seaborn‑v0_8‑talk',
'seaborn‑v0_8‑ticks',
'seaborn‑v0_8‑white',
'seaborn‑v0_8‑whitegrid',
'tableau‑colorblind10']

In [47]: # data
year = [2018,2019,2020,2021,2022,2023]
sales = [290, 300, 350, 360, 400, 416]

In [48]: plt.style.use('classic') # use this command to use a chart style


plt.bar(x=year,height=sales)
plt.show()

In [49]: # Example
plt.style.use('ggplot')
plt.bar(x=year,height=sales)
plt.show()

Save figure
we can save the figures.
plt.savefig()

In [50]: # Example
plt.style.use('ggplot')
plt.bar(x=year,height=sales)
plt.savefig("bargraph.png")
plt.show()

Great Job!

You might also like