Machine Learning With Python Unit 1-17-84 Final13092024
Machine Learning With Python Unit 1-17-84 Final13092024
Machine Learning With Python Unit 1-17-84 Final13092024
1
Introduction to Machine
Learning with Python
***
Arthur Samuel
17
1.5.3 Array Manipulation Functions . . . . . . . . . . . . . . . . . . . . . . . . . 43
1.6 SciPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
1.7 Matplotlib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
1.7.4 Colormaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
1.7.7 3D Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
1.8 scikit-learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Machine learning (ML) is a subset of artificial intelligence (AI) that enables systems to
automatically learn from data, identify patterns, and make decisions with minimal human
intervention. Instead of being explicitly programmed to perform a task, machine learning
algorithms use statistical techniques to improve their performance on a given task based on
experience or data.
The need for machine learning arises from the vast amounts of data generated in today’s digi-
tal age. Traditional programming, where rules are explicitly defined by humans, falls short in
handling complex, dynamic environments. For example, identifying patterns in large datasets,
recognizing speech, or translating languages involves complexities that are difficult to encode into
fixed rules. Machine learning allows systems to automatically learn and adapt from data, making
it essential for tasks where manual programming is infeasible.
Machine learning has evolved significantly since its inception. In the 1950s and 1960s, early neural
networks, inspired by the human brain, laid the foundation for learning systems. However, due
to limited computational power and theoretical understanding, progress was slow. The 1980s saw
the resurgence of interest in neural networks, particularly with the development of backpropaga-
tion algorithms. The 1990s introduced support vector machines and ensemble methods, which
enhanced the robustness and accuracy of machine learning models. The advent of the internet
and big data in the 2000s provided the fuel for modern machine learning, leading to the deep
learning revolution in the 2010s.
Python’s rise to prominence in the machine learning community is no coincidence. Python’s sim-
plicity and readability make it an ideal language for prototyping and experimentation, which are
crucial in machine learning. Additionally, the Python ecosystem is rich with libraries like NumPy,
SciPy, and Pandas for numerical computations, Matplotlib and Seaborn for data visualization,
and scikit-learn for machine learning algorithms. The development of deep learning frameworks
like TensorFlow, PyTorch, and Keras further cemented Python’s status as the go-to language for
machine learning, allowing researchers and engineers to build complex models with relative ease.
Machine learning is now ubiquitous, finding applications across various domains. In healthcare,
machine learning models are used for predicting patient outcomes, drug discovery, and personal-
ized medicine. In finance, algorithms help in fraud detection, stock market prediction, and risk
management. In the tech industry, machine learning drives recommendation systems, search en-
gines, and autonomous vehicles. Additionally, natural language processing, a subset of machine
learning, powers virtual assistants like Siri and Alexa, enabling them to understand and respond
to human queries.
Data is the lifeblood of machine learning. The effectiveness of a machine learning model largely
depends on the quality and quantity of the data it is trained on. Clean, well-labeled data allows
models to learn accurate patterns and make reliable predictions. On the other hand, noisy or
biased data can lead to models that perform poorly or perpetuate harmful biases. This highlights
the importance of data preprocessing, feature engineering, and careful selection of training data
in the machine learning pipeline.
The future of machine learning is promising, with ongoing research pushing the boundaries of
what is possible. Advances in unsupervised learning, where models learn from unstructured data
without explicit labels, are expected to unlock new possibilities. Additionally, reinforcement
learning, which involves training models through trial and error, is poised to revolutionize areas
like robotics and autonomous systems. The integration of machine learning with other emerging
technologies, such as quantum computing and edge computing, will likely lead to even more
powerful and efficient models.
As machine learning systems become more prevalent, ethical considerations are becoming in-
creasingly important. Issues such as data privacy, algorithmic bias, and the potential for job
displacement are at the forefront of discussions about the societal impact of machine learning.
Ensuring that machine learning models are fair, transparent, and accountable is crucial for main-
taining public trust and ensuring that the technology is used responsibly.
Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) are closely related
fields, each a subset of the other. AI is the broadest field that encompasses the development of
machines that can imitate human intelligence and perform cognitive tasks. Machine Learning,
a subset of AI, focuses on algorithms that enable systems to learn from data without explicit
programming. Deep Learning, which is a subset of ML, leverages neural networks designed to
simulate the human brain, allowing machines to make decisions by learning from large datasets.
The relationship between these fields can be visually represented through the following diagram:
Machine Learning
Machine Learning (ML): A subset
of AI that focuses on creating al-
gorithms that allow machines to
learn from data and improve their
Deep Learning performance over time without
explicit programming.
The diagram above illustrates the nested relationship between AI, ML, and DL. Artificial In-
telligence (AI) is the overarching field that aims to develop machines capable of performing
tasks that require human-like intelligence, such as problem-solving, decision-making, and nat-
ural language understanding. Machine Learning (ML), a subset of AI, focuses on building
algorithms that allow systems to learn from data and improve over time without being explicitly
programmed. Deep Learning (DL), the deepest layer in this hierarchy, uses neural networks
to simulate the human brain’s functioning, enabling machines to analyze vast amounts of data
and make accurate predictions.
Machine learning has a global impact, transforming industries, economies, and societies. It is
enabling new business models, improving efficiency, and fostering innovation in fields as diverse
as agriculture, energy, and education. Developing countries are leveraging machine learning to
solve unique challenges, such as optimizing resource allocation and improving access to healthcare.
As machine learning continues to evolve, its ability to address some of the world’s most pressing
problems is becoming increasingly apparent, making it a critical tool for future progress.
The confusion matrix provides insights into how well the classifier is performing on each
class. [7].
TP
Precision =
TP + FP
– Recall: Recall is the ratio of correctly predicted positive observations to all actual
positives:
TP
Recall =
TP + FN
These metrics are useful in assessing the performance of a classification model, especially
in cases where class imbalance is present.[14].
Regression
Machine Supervised
Learning Learning
Dimensionality
Reduction
Reinforcement Unsupervised
Clustering
Learning Learning
Value-
Based
Model- Policy-
Based Based
Machine learning can be categorized into three primary types: Supervised Learning, Unsupervised
Learning, and Reinforcement Learning. These categories are based on the nature of the learning
signal or feedback that the algorithm receives. Below, we delve into each type with a detailed
explanation of its subtypes.
• Examples:
– Image recognition: Identifying whether an image contains a dog, cat, or other animals.
• Algorithms:
– Logistic Regression
– Decision Trees
– Random Forests
– Neural Networks
Regression
In regression, the goal is to predict a continuous value, unlike classification, which deals with
categorical outputs. The model learns a mapping from the input features to a continuous output.
• Examples:
– House price prediction: Predicting the price of a house based on features like location,
size, and number of rooms.
– Stock price prediction: Estimating the future price of stocks based on historical data.
• Algorithms:
– Linear Regression
– Polynomial Regression
– Ridge Regression
– Lasso Regression
– Neural Networks
Clustering
Clustering is the process of grouping a set of objects in such a way that objects in the same group
(called a cluster) are more similar to each other than to those in other groups. It is commonly
used in exploratory data analysis to identify natural groupings in data.
• Examples:
– Customer segmentation: Grouping customers based on purchasing behavior.
– Anomaly detection: Detecting unusual patterns or outliers in data, such as fraud
detection.
– Document clustering: Grouping documents with similar topics.
• Algorithms:
– K-Means Clustering
– Hierarchical Clustering
– DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
– Gaussian Mixture Models (GMM)
Dimensionality Reduction
Dimensionality reduction involves reducing the number of input variables (or dimensions) in a
dataset while retaining as much information as possible. This is important in cases where having
too many input features (the curse of dimensionality) can degrade model performance.
• Examples:
– Principal Component Analysis (PCA): A linear technique used for reducing dimensions
while retaining the variance in data.
– t-SNE (t-distributed Stochastic Neighbor Embedding): A non-linear technique for di-
mensionality reduction, especially useful for visualization.
– Feature selection: Selecting the most important features while ignoring less useful ones.
• Algorithms:
– t-SNE
• Input: Actions taken by the agent and feedback from the environment
Reinforcement learning is commonly used in robotics, game AI, and autonomous systems. It is
subdivided into the following categories:
Value-Based Methods
In value-based reinforcement learning, the goal is to estimate the value of being in a particular
state or taking a particular action. The agent seeks to maximize the total expected rewards over
time by choosing actions based on these value estimates.
• Examples:
– Q-learning: An off-policy method that learns the value of actions without requiring a
model of the environment.
Policy-Based Methods
In policy-based reinforcement learning, the agent directly learns a policy (a mapping from states
to actions) without estimating value functions. These methods are particularly useful when the
action space is continuous, such as in robotic control tasks.
• Examples:
In model-based reinforcement learning, the agent attempts to learn a model of the environment
(i.e., the dynamics of how the environment behaves). Using this model, the agent can simulate
future states and plan actions more effectively.
• Examples:
– AlphaGo: Uses a model-based approach for planning moves in the game of Go.
• Social Media Auto-tagging: Platforms like Facebook use ML-based facial recognition
to suggest tags for friends in uploaded images.
• Medical Imaging: ML algorithms are employed to detect and diagnose diseases from
medical scans, such as X-rays or MRIs.
2. Speech Recognition: Machine learning is extensively used in converting voice into text,
a technology often called "speech-to-text." Applications include:
• Voice Search: Google’s voice search enables users to search for information using speech
input instead of typing.
• Virtual Assistants: Siri, Alexa, and Google Assistant rely on speech recognition to
follow and act on voice commands.
3. Traffic Prediction: Google Maps and other navigation services predict traffic by using
machine learning in the following ways:
• Real-Time Traffic Updates: Traffic conditions are predicted by analyzing data from
users’ smartphones and sensors in vehicles.
• Historical Data Analysis: ML algorithms look at traffic patterns from past data to
forecast future traffic conditions.
Finance: In the finance industry, machine learning is used for credit scoring, fraud detection,
and algorithmic trading. By analyzing historical financial data, machine learning models can
predict market trends, identify fraudulent transactions, and assess credit risks more effectively
than traditional methods [4].
Retail: Retailers use machine learning to enhance customer experience through personalized
recommendations, inventory management, and dynamic pricing strategies. Machine learning
algorithms analyze customer behavior, purchase history, and market trends to suggest products
that are likely to interest customers [16].
Natural Language Processing (NLP): NLP is a branch of machine learning that focuses on
the interaction between computers and human language. Applications include machine transla-
tion, sentiment analysis, and chatbots. Machine learning models are trained on large text datasets
to understand and generate human language in a way that is useful for various applications [11].
Agriculture: In agriculture, machine learning is used for precision farming, crop monitoring,
and yield prediction. Machine learning models can analyze data from drones, satellites, and
sensors to monitor crop health, optimize irrigation, and predict harvest yields, leading to more
efficient and sustainable farming practices [12].
Example: In banking, if a neural network denies a loan application, the bank may not
be able to explain why the decision was made, which raises concerns among customers and
regulators.
4. Scalability
Challenge: Machine learning models often need to be trained on massive datasets, which
requires high computational resources and careful optimization to scale to large data sizes.
Example: Social media companies like Facebook must process and analyze vast amounts
of user data in real-time to train recommendation algorithms. Handling such large datasets
efficiently is a major challenge.
Example: In facial recognition systems, bias in the training data has led to higher misiden-
tification rates for people of color, raising serious ethical concerns when the technology is
used in law enforcement.
Example: In personalized healthcare, machine learning models may need access to sensitive
patient data to provide accurate diagnoses, but sharing this data without compromising
privacy is a significant challenge.
Example: A model trained to detect objects on U.S. highways may not work well in Euro-
pean countries due to differences in vehicle types, road infrastructure, and traffic patterns.
• This command will download and install the necessary packages for machine learning.
• This will install the packages needed for scientific computing and machine learning.
• This should display the installed version of conda, the package manager included with
Anaconda.
4. Create a New Conda Environment (Optional):
• Anaconda already includes most of the essential packages. However, if you need to
install any additional packages, you can do so with:
To ensure everything is installed correctly, open a Python interactive shell (by typing python or
python3 in your terminal or command prompt) and run the following commands:
import numpy as np
import scipy
import matplotlib.pyplot as plt
import sklearn
If there are no errors, your setup is complete, and you can start working on machine learning
projects!
NumPy, SciPy, Matplotlib, and scikit-learn are essential libraries in Python’s scientific computing
and machine learning ecosystem. NumPy is the foundational package for numerical computing,
providing support for large, multi-dimensional arrays and matrices, along with mathematical
functions to operate on them. SciPy builds on NumPy and provides additional functions for
scientific and technical computing, including optimization, integration, interpolation, and linear
algebra. Matplotlib is a powerful library for creating visualizations, offering extensive plotting
capabilities for both static and interactive graphs. Scikit-learn is a widely-used machine learning
library that provides simple and efficient tools for data mining and analysis, including classifica-
tion, regression, clustering, and model evaluation. Together, these libraries form the backbone of
Python’s data science and machine learning workflows.
NumPy is a fundamental Python library for numerical and scientific computing. It provides
support for large, multi-dimensional arrays and matrices, along with a collection of mathematical
functions to operate on these arrays efficiently. NumPy is widely used in data science, machine
learning, and other technical computing tasks.Here are some commonly used NumPy functions:
• Array Creation Functions: array, zeros, ones, empty, arange, linspace
• Mathematical Functions: sum, mean, std, max, min, prod, dot
• Array Manipulation Functions: reshape, transpose, concatenate, stack, split
• Indexing and Slicing: Accessing parts of arrays using slices, boolean indexing, or fancy
indexing
• Statistical Functions: mean, median, std, var, percentile
• Linear Algebra Functions: dot, cross, inv, det, eig, svd
• Random Functions: random.rand, random.randn, random.randint, random.choice
• Sorting and Searching: sort, argsort, searchsorted
import numpy as np
Example - zeros
The NumPy zeros() function is used to create a new array of specified shape and size, filled entirely
with zeros. This function is particularly useful in initializing arrays where you need a baseline or
placeholder values of zeros, often for tasks like matrix initialization, creating masks, or defining
default values in iterative algorithms. The zeros() function requires the shape of the array as
an argument, which can be a tuple specifying the dimensions for multi-dimensional arrays. For
example, np.zeros((3, 4)) will create a 3x4 matrix with all elements set to 0. Additionally, the
data type of the elements can be specified using the dtype parameter, with the default being
float64. This function is efficient and widely used in numerical computations where arrays need
to be initialized with zeros before further processing.
Array of zeros:
[[0. 0. 0.]
[0. 0. 0.]]
Example - mean
The NumPy mean() function calculates the arithmetic mean (average) of the elements in a NumPy
array along a specified axis. The mean is computed as the sum of the elements divided by the num-
ber of elements. The basic syntax is np.mean(array, axis=None, dtype=None, keepdims=False),
where:
• array is the input NumPy array,
• axis (optional) specifies the axis along which the mean is computed. If no axis is provided,
it computes the mean of the flattened array (i.e., all elements),
• dtype (optional) allows specifying the data type for the result to avoid overflow,
• keepdims (optional) retains the dimensions of the input array if set to True.
For example, np.mean([1, 2, 3, 4, 5]) returns 3.0, which is the mean of the numbers. For a 2D
array like np.mean([[1, 2], [3, 4]], axis=0), the result is [2. 3.], which is the mean along the
columns. The mean() function is commonly used in data analysis and statistics to summarize
datasets.
Example - dot
The NumPy dot() function computes the dot product of two arrays. For 1D arrays, it calculates
the inner product, which is the sum of the products of corresponding elements. For 2D arrays
(matrices), dot() performs matrix multiplication. If either or both inputs are multi-dimensional
arrays (more than 2D), it computes the generalized matrix product.
The basic syntax is np.dot(arr1, arr2, out=None), where:
• arr1 and arr2 are the input arrays or matrices,
• out (optional) specifies an output array to store the result.
Reshaped array:
[[0 1 2]
[3 4 5]]
Example - concatenate
The NumPy concatenate() function is used to join two or more arrays along a specified axis. It
can be used to merge arrays either row-wise (along axis 0) or column-wise (along axis 1), and for
higher-dimensional arrays as well. The basic syntax is np.concatenate((arr1, arr2, ...), axis=0),
where:
• arr1, arr2, ... are the arrays to be concatenated,
• axis (optional) specifies the axis along which the arrays are joined. The default is axis=0
(row-wise for 2D arrays).
Concatenated array:
[[1 2]
[3 4]
[5 6]]
# Slicing an array
arr = np.array([10, 20, 30, 40, 50])
sliced_arr = arr[1:4] # Extract elements from index 1 to 3
print("\nSliced array:", sliced_arr)
Example - median
# Median of an array
arr = np.array([1, 3, 5, 2, 4])
median_value = np.median(arr)
print("\nMedian of array:", median_value)
Example - std
The NumPy std() function is used to compute the standard deviation of the elements in a NumPy
array. The standard deviation is a measure of the amount of variation or dispersion of a set of
values. A low standard deviation means that the values are close to the mean, while a high
standard deviation indicates that the values are spread out over a wider range.
The syntax is np.std(array, axis=None, dtype=None, out=None, ddof=0, keepdims=False),
where:
• array is the input array,
• axis (optional) specifies the axis along which to compute the standard deviation. If no axis
is specified, the standard deviation of the flattened array is calculated,
A · A−1 = I
The inv() function is part of the numpy.linalg module, which contains various linear algebra
operations.
The syntax is:
np.linalg.inv(array)
where:
• array is a square matrix (i.e., the number of rows equals the number of columns) that you
want to invert.
Key Points:
• Square Matrix: The matrix must be square (i.e., it must have the same number of rows
and columns).
• Singular Matrix: If a matrix is singular (i.e., it doesn’t have an inverse),
np.linalg.inv() will raise a LinAlgError. Singular matrices are those for which the
determinant is 0.
• Identity Matrix: The inverse of a matrix, when multiplied by the original matrix, results
in the identity matrix:
A · A−1 = I
# Matrix inversion
matrix = np.array([[1, 2], [3, 4]])
inverse_matrix = inv(matrix)
print("\nInverse of matrix:\n", inverse_matrix)
Inverse of matrix:
[[-2. 1. ]
[ 1.5 -0.5]]
A·v =λ·v
where:
• A is the square matrix.
• v is the eigenvector.
• λ is the eigenvalue corresponding to the eigenvector v.
The syntax for the eig() function is:
np.linalg.eig(array)
where:
• array is the square matrix for which you want to compute the eigenvalues and eigenvectors.
The function returns two outputs:
• An array of eigenvalues.
• An array of eigenvectors corresponding to each eigenvalue.
Key Points:
• Square Matrix: The matrix must be square (i.e., the number of rows equals the number
of columns).
• Eigenvalues: These are the scalars λ that satisfy the equation A · v = λ · v.
• Eigenvectors: These are the non-zero vectors v that, when multiplied by matrix A, pro-
duce a scalar multiple of themselves (i.e., λ · v).
The output will give the eigenvalues λ and the corresponding eigenvectors v.
Eigenvalues:
[ 3. -1.]
Eigenvectors:
[[ 0.70710678 -0.70710678]
[ 0.70710678 0.70710678]]
Random array:
[[0.43092672 0.7761631 0.30247136]
[0.74680129 0.35752529 0.39506951]
[0.46477145 0.26267473 0.35235185]]
where:
• array: The input array to be sorted.
• axis (optional): The axis along which to sort. If axis=-1 (default), it sorts along the last
axis.
• kind (optional): The sorting algorithm to use. Available options are ’quicksort’ (default),
’mergesort’, ’heapsort’, and ’stable’.
• order (optional): If the array contains fields, this specifies which fields to compare when
sorting.
Example 1: Sorting a 1D Array
np.sort([3, 1, 2, 5, 4])
Key Points:
• np.sort() sorts elements in ascending order by default.
• You can specify the axis along which sorting is to be performed in multi-dimensional arrays.
• Several sorting algorithms can be specified, such as ’quicksort’, ’mergesort’, and
’heapsort’.
• Sorting can be applied to structured arrays using the order parameter.
# Sorting an array
arr = np.array([3, 1, 2, 5, 4])
sorted_arr = np.sort(arr)
print("\nSorted array:", sorted_arr)
Sorted array: [1 2 3 4 5]
These examples illustrate the flexibility and power of NumPy functions for scientific computing,
data analysis, and machine learning tasks.
1.6 SciPy
SciPy is a Python library that is used for scientific and technical computing. It builds on NumPy
and provides a variety of functions for optimization, integration, interpolation, linear algebra,
statistics, signal processing, and more. Below are some commonly used functions in SciPy along
with suitable examples for each.Commonly Used SciPy Functions:
where:
• fun: The objective function to minimize.
• x0: Initial guess for the variables (starting point).
• args (optional): Extra arguments passed to the objective function.
• method (optional): The optimization algorithm to use, such as ’BFGS’, ’Nelder-Mead’,
’CG’, etc.
• Other parameters such as tol, bounds, and constraints can be provided to control the
optimization process.
Example - fsolve
The SciPy fsolve() function is used to find the roots of a system of nonlinear equations. Given
a function f (x), the goal of fsolve() is to find the value of x such that f (x) = 0. It is part of
the scipy.optimize module.
The syntax for fsolve() is:
where:
• func: The objective function or system of equations for which roots are to be found.
• x0: The initial guess for the solution.
• args (optional): Extra arguments to pass to the objective function.
• fprime (optional): The Jacobian of the system of equations, which can be provided for
better convergence.
Example - curve_fit
The curve_fit function from the scipy.optimize module is used to fit a curve to a set of data
points using nonlinear least squares. It is typically used to find the best-fitting parameters of a
predefined model function for a given dataset.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
plt.legend()
plt.show()
import numpy as np
from scipy.interpolate import interp1d
import matplotlib.pyplot as plt
import numpy as np
from scipy.signal import convolve
# Perform convolution
convolved_signal = convolve(signal1, signal2)
print("Convolved signal:", convolved_signal)
Example - fft
The Fast Fourier Transform (FFT), available in the scipy.fft module, is used to efficiently compute
the discrete Fourier transform (DFT) of a sequence, which transforms a signal from its time or
spatial domain into its frequency domain. This is useful in signal processing, audio analysis, image
analysis, and many scientific fields to analyze the frequency components of a signal, filter noise, or
compress data. FFT significantly reduces the computation time compared to directly computing
the DFT. It returns the frequency spectrum, allowing users to understand the frequency content
and amplitude of different signal components.
import numpy as np
from scipy.fft import fft
import matplotlib.pyplot as plt
plt.subplot(2, 1, 2)
plt.plot(np.abs(y_fft))
plt.title('FFT of the Signal')
plt.show()
import numpy as np
from scipy.linalg import inv
import numpy as np
from scipy.linalg import eig
import numpy as np
from scipy.stats import ttest_ind
1.7 Matplotlib
Matplotlib is one of the most popular Python libraries for creating static, interactive, and ani-
mated visualizations. It is particularly useful for generating plots, charts, and graphs that allow
for a detailed representation of data. Below are commonly used Matplotlib functions along with
examples for each category.Commonly Used Matplotlib Functions:
• Basic Plotting: plot, scatter, bar, hist
• Customizing Plots: title, xlabel, ylabel, legend, grid
• Subplots and Layouts: subplot, subplots, tight_layout, figure
• Colormaps: imshow, colorbar
• Saving Figures: savefig
• Object-Oriented Interface: Axes, Figure
• 3D Plotting: Axes3D, plot_surface, scatter3D
Example - plot
The Matplotlib plot() function is used to create 2D line plots. It is one of the most commonly
used functions for basic plotting in Python and is part of the matplotlib.pyplot module. The
plot() function allows you to visualize data by connecting data points with a straight line. You
can also customize the appearance of the plot, such as line color, style, and markers.
where:
• fmt (optional): A format string that defines the line style, marker, and color.
Example - scatter
The Matplotlib scatter() function is used to create scatter plots. Scatter plots are useful
for visualizing the relationship between two variables by displaying points at the intersection of
their values on the x-axis and y-axis. The scatter() function is part of the matplotlib.pyplot
module and allows for customization of marker size, color, and transparency.
The syntax for scatter() is:
where:
• x: The data for the x-axis.
• y: The data for the y-axis.
• s (optional): Marker size.
• c (optional): Marker color.
The Matplotlib bar() function is used to create bar charts. Bar charts are useful for comparing
different categories or showing the distribution of data. The height of the bars represents the
values of the variables, and you can customize the color, width, and alignment of the bars. The
bar() function is part of the matplotlib.pyplot module.
where:
Example - hist
The Matplotlib hist() function is used to create histograms. Histograms are useful for visu-
alizing the distribution of a dataset by grouping data into bins. The height of each bar in the
histogram represents the frequency of data points in each bin. The hist() function is part of
the matplotlib.pyplot module.
The syntax for hist() is:
matplotlib.pyplot.hist(x, bins=None, range=None, density=False, ...)
where:
• x: The data to be plotted.
• bins (optional): The number of bins or intervals in which the data is divided. Default is
10.
• range (optional): The lower and upper range of the bins. If not provided, the range is
determined by the data.
# Creating a histogram
plt.hist(data, bins=30, color='purple', edgecolor='black')
plt.title('Histogram')
plt.xlabel('Data values')
plt.ylabel('Frequency')
plt.show()
# Generate data
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)
# Create plot
plt.plot(x, y1, label='sin(x)', color='blue', linestyle='-', marker='o')
plt.plot(x, y2, label='cos(x)', color='red', linestyle='--', marker='x')
# Add legend
plt.legend()
# Display plot
plt.show()
In this example: - The title() function adds a title to the plot. - The xlabel() and ylabel()
functions label the x-axis and y-axis, respectively. - The legend() function displays the labels
for each line.
Line Styles
The following table lists common line styles:
Colors
Markers
Markers are used to represent data points on the plot. Common markers include:
Marker Description
’o’ Circle
’x’ X
’s’ Square
’D’ Diamond
’^’ Triangle up
’v’ Triangle down
# Data
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)
# Creating plots
plt.plot(x, y1, label='sin(x)')
plt.plot(x, y2, label='cos(x)')
plt.title('Sine and Cosine Functions')
plt.xlabel('x')
plt.ylabel('y')
plt.legend() # Adding a legend
plt.grid(True) # Adding gridlines
plt.show()
Output: A plot with sine and cosine functions, including title, axis labels, a legend, and gridlines.
The Matplotlib subplots() function is used to create multiple plots (subplots) in a single
figure. This function provides an easy way to create a grid of subplots and manage multiple axes
within the same figure. It returns a figure object and an array of axes objects, allowing for full
control over the layout and content of each subplot.
where:
# Data
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)
# Creating subplots
fig, axs = plt.subplots(2, 1, figsize=(6, 6))
# First subplot
axs[0].plot(x, y1)
axs[0].set_title('sin(x)')
# Second subplot
axs[1].plot(x, y2)
axs[1].set_title('cos(x)')
1.7.4 Colormaps
Example - imshow
The Matplotlib imshow() function is used to display image data. It can be used to visualize
2D arrays as images, where the values of the array are mapped to colors. This is commonly used
in image processing and heatmap visualizations. The ‘imshow()‘ function can display images in
grayscale or color, depending on the colormap applied.
The syntax for imshow() is:
matplotlib.pyplot.imshow(X, cmap=None, interpolation=None, ...)
where:
• X: The image data or 2D array to be displayed.
• cmap (optional): The colormap used to map scalar data to colors (e.g., ’gray’, ’viridis’).
• interpolation (optional): The interpolation method used for rendering the image (e.g.,
’nearest’, ’bilinear’).
where:
• dpi (optional): The resolution of the saved figure in dots per inch (default is 100).
• format (optional): The file format to save as (e.g., ’png’, ’pdf’, ’svg’). If not provided,
the format is inferred from the file extension.
• bbox_inches (optional): Specifies how the bounding box is calculated. ’tight’ ensures
that the figure fits tightly around the elements.
# Creating a plot
plt.plot(x, y, label='sin(x)')
plt.title('Line Plot of sin(x)')
plt.xlabel('x')
plt.ylabel('sin(x)')
plt.legend()
The Matplotlib Object-Oriented Interface allows you to create plots with more control
and flexibility compared to the functional interface. In this approach, you directly work with
figure and axes objects, which makes it easier to manage complex plots with multiple subplots or
customized layouts.
The syntax typically involves creating a figure and one or more axes using the plt.subplots()
function, and then calling methods on these objects to generate and customize the plot.
# Data
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.show()
1.7.7 3D Plotting
Example - plot_surface
Matplotlib provides support for creating 3D plots using the mplot3d toolkit, which is part
of the matplotlib library. One of the most commonly used functions for 3D plotting is
plot_surface(), which allows you to create surface plots to visualize 3D data.
To use 3D plotting, you must first import the Axes3D class and create a 3D projection for the
plot. The plot_surface() function is then used to generate a surface from the provided x, y,
and z data.
The syntax for plot_surface() is:
where:
• X: A 2D array representing the x-coordinates of the surface.
• Y: A 2D array representing the y-coordinates of the surface.
• Z: A 2D array representing the z-coordinates (heights) of the surface.
• cmap (optional): The colormap used to map the values of Z to colors.
# Generating 3D data
x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
x, y = np.meshgrid(x, y)
z = np.sin(np.sqrt(x**2 + y**2))
# Creating a 3D plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
Example -scatter3D
Matplotlib provides support for 3D scatter plots using the mplot3d toolkit. The scatter3D()
function allows you to create 3D scatter plots, which are useful for visualizing points in three-
dimensional space.
where:
1.8 scikit-learn
Scikit-learn is a popular machine learning library in Python, built on top of NumPy, SciPy, and
Matplotlib. It provides simple and efficient tools for data mining, data analysis, and machine
learning. Scikit-learn includes algorithms for classification, regression, clustering, dimensionality
reduction, model selection, and preprocessing. It is widely used for building and evaluating
machine learning models due to its ease of use and comprehensive coverage of machine learning
tasks.Key Features of Scikit-learn:
• Classification: Identify to which category an object belongs. Examples include logistic
regression, decision trees, support vector machines (SVM), k-nearest neighbors (KNN), and
more.
• Regression: Predict a continuous value. Algorithms include linear regression, ridge re-
gression, lasso regression, and more.
• Clustering: Unsupervised learning tasks to group similar objects. Examples include k-
means clustering, hierarchical clustering, and DBSCAN.
• Dimensionality Reduction: Reduce the number of features or variables in the dataset.
Examples include principal component analysis (PCA) and singular value decomposition
(SVD).
• Model Selection: Methods for tuning hyperparameters and evaluating model perfor-
mance, such as cross-validation, grid search, and random search.
• Preprocessing: Tools to prepare the data, including standardization, normalization, en-
coding categorical variables, and dealing with missing data.
Accuracy: 100.00%
Explanation:
• Dataset: We use the Iris dataset, which contains 150 samples of iris flowers with 4 features
(sepal length, sepal width, petal length, petal width).
• Model: Logistic Regression, a common classifier, is used to predict the species of the iris
flowers.
• Accuracy: The model is evaluated by comparing the predicted values with the actual values
from the test set.