Dap Module5 (New 2024)
Dap Module5 (New 2024)
Dap Module5 (New 2024)
Module 5
Visualization with Matplotlib and Seaborn
Matplotlib package
Matplotlib is a multiplatform data visualization library built on NumPy arrays. The matplotlib
package is the main graphing and plotting tool . The package is versatile and highly
configurable, supporting several graphing interfaces.
Matplotlib, together with NumPy and SciPy provides MATLAB-like graphing capabilities.
The benefits of using matplotlib in the context of data analysis and visualization are as follows:
• Integration with NumPy and SciPy (used for signal processing and numerical analysis) is
seamless.
• The package is highly customizable and configurable, catering to most people’s needs.
The package is quite extensive and allows embedding plots in a graphical user interface.
Other Advantages
• One of Matplotlib’s most important features is its ability to play well with many operating
systems and graphics backends. Matplotlib supports dozens of backends and output types,
which means you can count on it to work regardless of which operating system you are
using or which output format you wish. This cross-platform, everything-to-everyone
approach has been one of the great strengths of Matplotlib.
• It has led to a large userbase, which in turn has led to an active developer base and
Matplotlib’s powerful tools and ubiquity within the scientific Python world.
• Pandas library itself can be used as wrappers around Matplotlib’s API. Even with wrappers
like these, it is still often useful to dive into Matplotlib’s syntax to adjust the final plot
output.
Plotting Graphs
This section details the building blocks of plotting graphs: the plot() function and how to
control it to generate the output we require.
The functionality of plot() is similar to that of MATLAB and GNU-Octave with some minor
differences, mostly due to the fact that Python has a different syntax from MATLAB and
GNU-Octave.
The vector y is passed as an input to plot(). As a result, plot() drew a graph of the vector y
using auto-incrementing integers for an x-axis. Which is to say that, if x-axis values are
not supplied, plot() will automatically generate one for you: plot(y) is equivalent to
plot(range(len(y)), y).
Note:If you don’t have a GUI installed with matplotlib, replace show() with
savefig('filename') and open the generated image file in an image viewer.)
The call to function figure() generates a new figure to plot on, so we don’t overwrite the previous
figure.
• Let’s look at some more options. Next, we want to plot y as a function of t, but display only
markers, not lines. This is easily done:
To select a different marker, replace the character 'o' with another marker symbol.
Table below lists some popular choices; issuing help(plot) provides a full account of the
available markers.
Controlling Graph
For a graph to convey an idea aesthetically, though it is important, the data is not
everything. The grid and grid lines, combined with a proper selection of axis and labels,
present additional layers of information that add clarity and contribute to overall graph
presentation.
Now, let’s focus to controlling the figure by controlling the x-axis and y-axis behavior
and setting grid lines.
• Axis
• Grid and Ticks
• Subplots
• Erasing the Graph
Axis
The axis() function controls the behavior of the x-axis and y-axis ranges. If you do not supply a
parameter to axis(), the return value is a tuple in the form (xmin, xmax, ymin, ymax). You can
use axis() to set the new axis ranges by specifying new values: axis([xmin, xmax, ymin, ymax]).
If you’d like to set or retrieve only the x-axis values or y-axis values, do so by using the
functions xlim(xmin, xmax) or ylim(ymin, ymax), respectively.
The function axis() also accepts the following values: 'auto', 'equal', 'tight', 'scaled', and 'off'.
— The value 'auto'—the default behavior—allows plot() to select what it thinks are the best
values.
— The value 'equal' forces each x value to be the same length as each y value, which is
important if you’re trying to convey physical distances, such as in a GPS plot.
— The value 'tight' causes the axis to change so that the maximum and minimum values of
x and y both touch the edges of the graph.
— The value 'scaled' changes the x-axis and y-axis ranges so that x and y have both the
same length (i.e., aspect ratio of 1).
— Lastly, calling axis('off') removes the axis and labels.
Figure below shows the results of applying different axis values to this circle.
The function grid() draws a grid in the current figure. The grid is composed of a set of
horizontal and vertical dashed lines coinciding with the x ticks and y ticks. You can toggle
the grid by calling grid() or set it to be either visible or hidden by using grid(True) or
grid(False), respectively.
To control the ticks (and effectively change the grid lines, as well), use the functions xticks()
and yticks(). The functions behave similarly to axis() in that they return the current ticks if
ROOPA.H.M, Dept of MCA, RNSIT Page 4
Module 5 [20MCA31] Data Analytics using Python
no parameters are passed; you can also use these functions to set ticks once parameters
are provided. The functions take an array holding the tick values as numbers and an
optional tuple containing text labels. If the tuple of labels is not provided, the tick numbers
are used as labels.
Adding Text
There are several options to annotate your graph with text. You’ve already seen some, such as
using the xticks() and yticks() function.
The following functions will give you more control over text in a graph.
Title
The function title(str) sets str as a title for the graph and appears above the plot area. The
function accepts the arguments listed in Table 6-5.
All alignments are based on the default location, which is centered above the graph. Thus,
setting ha='left' will print the title starting at the middle (horizontally) and extending to the
right. Similarly, setting ha='right' will print the title ending in the middle of the graph
(horizontally). The same applies for vertical alignment. Here’s an example of using the title()
function:
The functions xlabel() and ylabel() are similar to title(), only they’re used to set the x-axis and y-
axis labels, respectively. Both these functions accept the text arguments .
>>> xlabel('time [seconds]')
Next on our list of text functions is legend(). The legend() function adds a legend box and
associates a plot with text:
The legend order associates the text with the plot. An alternative approach is to specify the
label argument with the plot() function call, and then issue a call to legend() with no
parameters:
loc can take one of the following values: 'best', 'upper right', 'upper left', 'lower left', 'lower right',
'right', 'center left', 'center right', 'lower center', 'upper center', and 'center'. Instead of using
strings, use numbers: 'best' corresponds to 0, 'upper left' corresponds to 1, and 'center'
corresponds to 10. Using the value 'best' moves the legend to a spot less likely to hide data;
however, performance-wise there may be some impact.
Text Rendering
The text(x, y, str) function accepts the coordinates in graph units x, y and the string to print,
str. It also renders the string on the figure. You can modify the text alignment using the
arguments. The following will print text at location (0, 0):
The function text() has many other arguments, such as rotation and fontsize.
Example:
The example script summarizes the functions we’ve discussed up to this point: plot() for
plotting; title(), xlabel(), ylabel(), and text() for text annotations; and xticks(), ylim(), and grid()
for grid control.
Object-oriented design of matplotlib involves two functions, setp() and getp(), that retrieve and
set a matplotlib object’s parameters. The benefit of using setp() and getp() is that automation is
easily achieved. Whenever a plot() command is called, matplotlib returns a list of matplotlib
objects.
For example, you can use the getp() function to get the linestyle of a line object. You can use the
setp() function to set the linestyle of a line object.
Here is an example of how to use the getp() and setp() functions to get and set the linestyle of a
line object:
This code will create a line plot of the data in the x and y lists. The linestyle of the line object will
be set to dashed. The code will then print the linestyle to the console. Finally, the code will show
the plot.
This code will create a line plot of the data in the x and y lists. The x-axis label will be set to
"X-axis", the y-axis label will be set to "Y-axis", and the title of the plot will be set to "My Plot".
The code will then print the x-axis and y-axis limits to the console. Finally, the code will show
the plot.
You can use the get_xlim() and get_ylim() functions to get the current x-axis and y-axis limits,
respectively. You can use the set_xlim() and set_ylim() functions to set the x-axis and y-axis
limits, respectively.
Patches
Drawing shapes requires some more care. matplotlib has objects that represent many
common shapes, referred to as patches. Some of these, like Rectangle and Circle are found
in matplotlib.pyplot, but the full set is located in matplotlib.patches.
To add a shape to a plot, create the patch object shp and add it to a subplot by calling
ax.add_patch(shp).
To work with patches, assign them to an already existing graph because, in a sense, patches
are “patched” on top of a figure. Table below gives a partial listing of available patches. In this
table, the notation xy indicates a list or tuple of (x, y) values
Import Libraries: Import the necessary libraries including Seaborn and Matplotlib. Seaborn
comes with several built-in datasets for practice. You can load one using the load_dataset
function.
Load a Dataset:
tips_data = sns.load_dataset("tips")
Customize Seaborn Styles: Seaborn comes with several built-in styles. You can set the style using
sns.set_style().
sns.set_style("whitegrid")
# Other styles include "darkgrid", "white", "dark", and "ticks"
Advanced Scatter Plots: Create a scatter plot with additional features like hue, size, and style.
Pair Plots for Multivariate Analysis: Visualize relationships between multiple variables with pair
plots.
correlation_matrix = tips_data.corr()
sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm")
Violin Plots: Visualize the distribution of a numerical variable for different categories.
FacetGrid for Customized Subplots: Use FacetGrid to create custom subplots based on
categorical variables.
Joint Plots: Combine scatter plots with histograms for bivariate analysis.
Question bank:
1. Explain , how simple line plot can be created using matplotlib? Show the adjustments
done to the plot w.r.t line colors.
The simplest of all plots is the visualization of a single function y = f (x ). Here we will create
simple line plot.
In Matplotlib, the figure (an instance of the class plt.Figure) can be thought of as a single
container that contains all the objects representing axes, graphics, text, and labels. The
axes (an instance of the class plt.Axes) is what we see above: a bounding box with ticks
and labels, which will eventually contain the plot elements that make up the visualization.
Alternatively, we can use the pylab interface, which creates the figure and axes in the
background. Ex: plt.plot(x, np.sin(x))
The plt.plot() function takes additional arguments that can be used to specify the color
keyword, which accepts a string argument representing virtually any imaginable color. The
color can be specified in a variety of ways.
Matplotlib was originally written as a Python The object-oriented interface is available for
alternative for MATLAB users, and much of its these more complicated situations, and for
syntax reflects that fact. when we want more control over your
figure.
The MATLAB-style tools are contained in the
pyplot (plt) interface.
Interface is stateful: it keeps track of the Rather than depending on some notion of
current” figure and axes, where all plt an “active” figure or axes, in the object-
commands are applied. once the second panel oriented interface the plotting functions are
is created, going back and adding something methods of explicit Figure and Axes
to the first is bit complex. objects.
3. Write the lines of code to create a simple histogram using matplotlib library.
A simple histogram can be useful in understanding a dataset. the below code creates a
simple histogram.
4. What are the two ways to adjust axis limits of the plot using Matplotlib? Explain with the example
for each.
Matplotlib does a decent job of choosing default axes limits for your plot, but some‐ times
it’s nice to have finer control.
• using plt.axis()
The plt.axis( ) method allows you to set the x and y limits with a single call, by passing a
list that specifies [xmin, xmax, ymin, ymax].
5. List out the dissimilarities between plot() and scatter() functions while plotting scatter plot.
• The difference between the two functions is: with pyplot.plot() any property you apply
(color, shape, size of points) will be applied across all points whereas in pyplot.scatter() you
have more control in each point’s appearance. That is, in plt.scatter() you can have the color,
shape and size of each dot (datapoint) to vary based on another variable.
• While it doesn’t matter as much for small amounts of data, as datasets get larger than a
few thousand points, plt.plot can be noticeably more efficient than plt.scatter. The reason is
that plt.scatter has the capability to render a different size and/or color for each point, so
the renderer must do the extra work of constructing each point individually. In plt.plot, on
the other hand, the points are always essentially clones of each other, so the work of
determining the appearance of the points is done only once for the entire set of data.
• For large datasets, the difference between these two can lead to vastly different performance,
and for this reason, plt.plot should be preferred over plt.scatter for large datasets.
6. How to customize the default plot settings of Matplotlib w.r.t runtime configuration
and stylesheets? Explain with the suitable code snippet.
• Each time Matplotlib loads, it defines a runtime configuration (rc) containing the default
styles for every plot element we create.
• We can adjust this configuration at any time using the plt.rc convenience routine.
• To modify the rc parameters, we’ll start by saving a copy of the current rcParams
dictionary, so we can easily reset these changes in the current session:
IPython_default = plt.rcParams.copy()
• Now we can use the plt.rc function to change some of these settings:
Seaborn Matplotlib
Let us assume
x=[10,20,30,45,60]
y=[0.5,0.2,0.5,0.3,0.5]
Matplotlib Seaborn
#to plot the graph #to plot the graph
import matplotlib.pyplot as plt import seaborn as sns
plt.style.use('classic') sns.set()
plt.plot(x, y) plt.plot(x, y)
plt.legend('ABCDEF',ncol=2, plt.legend('ABCDEF',ncol=2,
loc='upper left') loc='upper left')
8. List and describe different categories of colormaps with the suitable code snippets.
Three different categories of colormaps:
Divergent colormaps : These usually contain two distinct colors, which show positive and
negative deviations from a mean (e.g., RdBu or PuOr).
Qualitative colormaps : These mix colors with no particular sequence (e.g., rainbow or jet).
10.With the suitable example, describe how to draw histogram and kde plots using
seaborn.
Often in statistical data visualization, all we want is to plot histograms and joint
distributions of variables.