Machine Learning Document

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

SKILL ORIENTED PROGRAMMING ON

MACHINE LEARNING
Submitted to JNTUA in partial fulfillment of the requirement
For the award of the degree of

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING

Submitted By
B.Srinivasulu
(192H1A0514)

AUDISANKARA
INSTITUTE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
(Accredited By: NAAC|Approved By:AICTE|Affiliated to JNTUA)
NH-5, BYPASS ROAD, GUDUR-524101,SRI BALAJI (DT.). ANDHRA PRADESH.
2022-2023
ACKNOWLEDGEMENT
First and Foremost, I would like to thank my beloved parents for their blessings
and grace in making this skill oriented programming success. I avail this
opportunity to express our profound sense of sincere and deep gratitude to
those who constantly guided, supported and encourage during the course of my
skill oriented programming.
I wish to express my heartfelt thanks and deep sense of gratitude to the
honorable chairman Dr.V.PENCHALAIAH for his encouragement and inspiration
throughout the process.
I would like to thank my beloved Director of “AUDISANKARA INSTITUTE OF
TECHNOLOGY” Dr. A. MOHAN creating a competitive environment in our
collage and encouraging throughout this course.
I would like to thank my collage management for having allowed me to do the
project work.Lastly ,I would like to pay our regards and thank our principal
Dr.T.VENU MADHAV whose ideas are proved to be really worth full in our work.
I wish to express our deep sense of gratitude to my beloved and esteemed Head
of the department of CSE, Dr.A.SWARUPA RANI, assoc.Professor. For her
support, encouragement and valuable suggestions, this went a long way in the
successful completion of this skill oriented programming.
DECLARATION
I hereby declare that the skill oriented programming entitled “MACHINE
LEARNING” been successfully completed. This skill oriented programming work
has been submitted to “AUDISANKARA INSTITUTE OF TECHNOLOGY”, GUDUR
as a part of partial fulfillment of the requirements for the award of degree of
bachelor of technology. I also declare that this skill oriented programming
report has not been submitted at any time to another institute or university for
the award of any degree.

B.Srinivasulu
(192H1A0514)

PLACE: GUDUR,
DATE:
INDEX
SL.NO NAME OF THE DATE OF PAGE NO
CHAPTER WORK
1. INTRODUCTION TO 12/08/2022 1
MACHINE
LEARNING
2. CLASSIFICATION OF 13/08/2022- 2
MACHINE 15/08/2022
LEARNING
3. HISTORY OF 16/08/2022 4
MACHINE
LEARNING
4. LIFE CYCLE OF 17/08/2022 6
MACHINE
LEARNING
5 STRUCTURE OF 18/08/2022 6
MACHINE
LEARNING
6 CLASSIFICATION 19/08/2022 8
ALGORITHM IN ML

7 LOGISTIC 20/08/2022- 9
REGRESSION IN ML 22/08/2022

8 CLUSTERING IN ML 23/08/2022- 12
25/08/2022

9 CLUSTERING 26/08/2022- 16
ALGORITHMS 27/08/2022
10 DATA PROCESSING 28/08/2022- 17
29/08/2022

11 REINFORCEMENT 30/08/2022- 19
LEARNING 31/08/2022

12 INTRODUCTION TO 1/09/2022- 21
DIMENSIONALITY 2/09/2022
REDUCTION
13 STEP BY STEP 3/09/2022- 25
IMPLEMENTATION 5/09/2022
IN PYTHON
1 INTRODUCTION TO MACHINE LEARNING:
Arthur Samuel, an early American leader in the field of computer gaming
and artificial intelligence, coined the term “Machine Learning ” in 1959 while at
IBM. He defined machine learning as “the field of study that gives computers the
ability to learn without being explicitly programmed “.
➢ The field of study known as machine learning is concerned with the
question of how to construct computer programs that automatically
improve with experience.

DEFINITION OF LEARNING:
A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P , if its performance at tasks T,
as measured by P , improves with experience E.

EXAMPLES:
1.Handwritingn learning problem
> Task T : Recognizing and classifying handwritten words within images
> Performance P : Percent of words correctly classified
> experience E : A dataset of handwritten words with given classifications
2.A robot driving learning problem
> Task T : Driving on highways using vision sensors
> Performance P : Average distance traveled before an error
> Training experience E : A sequence of images and steering commands record
while observing a human driver

1
DEFINITION:
➢ A computer program which learns from experience is called a machine
learning program or simply a learning program .

2. CLASSIFICATION OF MACHINE LEARNING:


Machine learning implementations are classified into four major categories,
depending on the nature of the learning “signal” or “response” available to a
learning system which are as follows:

A.Supervised learning
Supervised learning is the machine learning task of learning a function that maps
an input to an output based on example input-output pairs. The given data is
labeled . Both classification and regression problems are supervised learning
problems .
Example — Consider the following data regarding patients entering a clinic . The
data consists of the gender and age of the patients and each patient is labeled as
“healthy” or “sick”.
Gender age label
M 49 sick
M 67 sick
F 53 healthy
M 49 sick
F 32 healthy
M 34 healthy
M 21 healthy

2
B. Unsupervised learning:
Unsupervised learning is a type of machine learning algorithm used to draw
inferences from datasets consisting of input data without labeled responses. In
unsupervised learning algorithms, classification or categorization is not included in
the observations. Example: Consider the following data regarding patients entering
a clinic. The data consists of the gender and age of the patients.

Gender age
M 48
M 67
F 53
M 49
F 34
M 21

C. Reinforcement learning:
Reinforcement learning is the problem of getting an agent to act in the world so as
to maximize its rewards.
A learner is not told what actions to take as in most forms of machine learning but
instead must discover which actions yield the most reward by trying them. For
example — Consider teaching a dog a new trick: we cannot tell it what tell it to do
what to do, but we can reward/punish it if it does the right/wrong thing.

3
D. Semi-supervised learning:
Where an incomplete training signal is given: a training set with some (often many)
of the target outputs missing. There is a special case of this principle known as
Transduction where the entire set of problem instances is known at

learning time, except that part of the targets are missing. Semi-supervised learning
is an approach to machine learning that combines small labeled data with a large
amount of unlabeled data during training. Semi-supervised learning falls between
unsupervised learning and supervised learning.

3. HISTORY OF MACHINE LEARNING:


➔ 1950 — Alan Turing creates the “Turing Test” to determine if a computer has
real intelligence. To pass the test, a computer must be able to fool a human
into believing it is also human.
➔ 1952 — Arthur Samuel wrote the first computer learning program. The
program was the game of checkers, and the IBM computer improved at the
game the more it played, studying which moves made up winning strategies
and incorporating those moves into its program.
➔ 1957 — Frank Rosenblatt designed the first neural network for computers
(the perceptron), which simulate the thought processes of the human brain.
➔ 1967 — The “nearest neighbor” algorithm was written, allowing computers
to begin using very basic pattern recognition. This could be used to map a
route for traveling salesmen, starting at a random city but ensuring they visit
all cities during a short tour.
➔ 1979 — Students at Stanford University invent the “Stanford Cart” which can
navigate obstacles in a room on its own.
➔ 1981 — Gerald Dejong introduces the concept of Explanation Based Learning
(EBL), in which a computer analyses training data and creates a general rule
it can follow by discarding unimportant data.

4
➔ 1985 — Terry Sinofsky invents NetTalk, which learns to pronounce words the
same way a baby does.
➔ 1990s — Work on machine learning shifts from a knowledge-driven
approach to a data-driven approach. Scientists begin creating programs for
computers to analyze large amounts of data and draw conclusions — or
“learn” — from the results.
➔ 1997 — IBM’s Deep Blue beats the world champion at chess.
➔ 2006 — Geoffrey Hinton coins the term “deep learning” to explain new
algorithms that let computers “see” and distinguish objects and text in
images and videos.
➔ 2010 — The Microsoft Kinect can track 20 human features at a rate of 30
times per second, allowing people to interact with the computer via
movements and gestures.
➔ 2011 — IBM’s Watson beats its human competitors at Jeopardy.
➔ 2011 — Google Brain is developed, and its deep neural network can learn to
discover and categorize objects much the way a cat does.
➔ 2012 – Google’s X Lab develops a machine learning algorithm that is able to
autonomously browse YouTube videos to identify the videos that contain
cats.
➔ 2014 – Facebook develops DeepFace, a software algorithm that is able to
recognize or verify individuals on photos to the same level as humans can.
➔ 2015 – Amazon launches its own machine learning platform.
➔ 2015 – Microsoft creates the Distributed Machine Learning Toolkit, which
enables the efficient distribution of machine learning problems across
multiple computers.
➔ 2015 – Over 3,000 AI and Robotics researchers, endorsed by Stephen
Hawking, Elon Musk and Steve Wozniak (among many others), sign an open
letter warning of the danger of autonomous weapons which select and
engage targets without human intervention.

5
➔ 2016 – Google’s artificial intelligence algorithm beats a professional player
at the Chinese board game Go, which is considered the world’s most complex
board game and is many times harder than chess. The AlphaGo algorithm
developed by Google DeepMind managed to win five games out of five in the
Go competition.

4. LIFE CYCLE OF MACHINE LEARNING:


Machine learning has given the computer systems the abilities to automatically
learn without being explicitly programmed. But how does a machine learning
system work? So, it can be described using the life cycle of machine learning.
Machine learning life cycle is a cyclic process to build an efficient machine learning
project. The main purpose of the life cycle is to find a solution to the problem or
project.
Machine learning life cycle involves seven major steps, which are given below:
- Gathering Data
- Data preparation
- Data Wrangling
- Analyse Data
- Train the model
- Test the model
- Deployment

5. STRUCTURE OF MACHINE LEARNING:


Structured machine learning refers to gaining knowledge of established
hypotheses from statistics with rich inner structure typically with inside one or
greater relations. In general, the statistics might encompass shown inputs in
addition to outputs, components of which can be uncertain, noisy, or missing.

6
USES OF MACHINE LEARNING:
➢ Finance
➢ Health
➢ Government
➢ Stores
➢ Oil and gas
➢ Transport

Structure of Machine Learning

7
6. CLASSIFICATIONMACHINE LEARNING:
The Classification algorithm is a Supervised Learning technique that is
used to identify the category of new observations on the basis of training data. In
Classification, a program learns from the given dataset or observations and then
classifies new observation into a number of classes or groups. Such as, Yes or No, 0
or 1, Spam or Not Spam, cat or dog, etc. Classes can be called as targets/labels or
categories. Unlike regression, the output variable of Classification is a category, not
a value, such as “Green or Blue”, “fruit or animal”, etc. Since the Classification
algorithm is a Supervised learning technique, hence it takes labeled input data,
which means it contains input with the corresponding output.
In classification algorithm a discrete output function (y) is mapped to input variable
(x)
Y=f(x) , where u=categorical output
The main goal of the Classification algorithm is to identify the category of a given
dataset, and these algorithms are mainly used to predict the output for the
categorical data.
Classification algorithms can be better understood using the below diagram. In the
below diagram, there are two classes, class A and Class B. These classes have
features that are similar to each other and dissimilar to other classes.

8
The algorithm which implements the classification on a dataset is known as a
classifier. There are two types of Classifications:
Binary Classifier: If the classification problem has only two possible outcomes,
then it is called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.
Multi-class Classifier: If a classification problem has more than two outcomes,
then it is called as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of music.

TYPES OF ML CLASSIFICATIONS:
Classification Algorithms can be further divided into the Mainly two category:
➢ Linear Models
Logistic Regression
Support Vector Machines
➢ Non-linear Models
K-Nearest Neighbours
Kernal SVM
Naïve Bayes
Decision Tree Classification
Random Forest Classifications

7. LOGISTIC REGRESSION IN MACHINE LEARNING:


Logistic regression is one of the most popular Machine Learning algorithms,
which comes under the Supervised Learning technique. It is used for predicting the
categorical dependent variable using a given set of independent variables.
Logistic regression predicts the output of a categorical dependent variable.
Therefore the outcome must be a categorical or discrete value. It can be either Yes
or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and 1, it
gives the probabilistic values which lie between 0 and 1.

9
Logistic Regression is much similar to the Linear Regression except that how they
are used. Linear Regression is used for solving Regression problems, whereas
Logistic regression is used for solving the classification problems.
In Logistic regression, instead of fitting a regression line, we fit an “S” shaped
logistic function, which predicts two maximum values (0 or 1).
The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, a mouse is obese or not based on its weight,
etc.
Logistic Regression is a significant machine learning algorithm because it has the
ability to provide probabilities and classify new data using continuous and discrete
datasets.
Logistic Regression can be used to classify the observations using different types of
data and can easily determine the most effective variables used for the
classification. The below image is showing the logistic function:

10
LOGISTIC FUNCTION:
• The sigmoid function is a mathematical function used to map the predicted
values to probabilities.
• It maps any real value into another value within a range of 0 and 1.
• The value of the logistic regression must be between 0 and 1, which cannot
go beyond this limit, so it forms a curve like the “S” form. The S-form curve
is called the Sigmoid function or the logistic function.
• In logistic regression, we use the concept of the threshold value, which
defines the probability of either 0 or 1. Such as values above the threshold
value tends to 1, and a value below the threshold values tends to 0.

ASSUMPTIONS:
• The dependent variable must be categorical in nature.
• The independent variable should not have multi-collinearity.

LOGISTIC REGRESSION EQUATION:


The Logistic regression equation can be obtained from the Linear
Regression equation. The mathematical steps to get Logistic Regression
equations are given below:
➔ We know the equation of the straight line can be written as:
Y= b0+b1x1+b2x2+b3x3+……+bnxn
➔ In Logistic Regression y can be between 0 and 1 only, so for this let’s divide
the above equation by (1-y):
Y/1-y ;0 for y0, and infinity for y=1
➔ But we need range between –[infinity] to +[infinity], then take logarithm of
the equation it will become:
Log[y/1-y] = b0+b1x1+b2x2+b3x3+………bnxn

11
TYPES OF LOGICAL REGRESSION:
On the basis of the categories, Logistic Regression can be classified into three
types:
Binomial: In binomial Logistic regression, there can be only two possible types of
the dependent variables, such as 0 or 1, Pass or Fail, etc.
Multinomial: In multinomial Logistic regression, there can be 3 or more possible
unordered types of the dependent variable, such as “cat”, “dogs”, or “sheep”
Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered
types of dependent variables, such as “low”, “Medium”, or “High”.

8. CLUSTERING IN MACHINE LEARNING:


Clustering or cluster analysis is a machine learning technique, which groups the
unlabelled dataset. It can be defined as “A way of grouping the data points into
different clusters, consisting of similar data points. The objects with the possible
similarities remain in a group that has less or no similarities with another group.”
It does it by finding some similar patterns in the unlabelled dataset such as shape,
size, color, behavior, etc., and divides them as per the presence and absence of
those similar patterns.
It is an unsupervised learning method, hence no supervision is provided to the
algorithm, and it deals with the unlabeled dataset.
After applying this clustering technique, each cluster or group is provided with a
cluster-ID. ML system can use this id to simplify the processing of large and complex
datasets.
The clustering technique is commonly used for statistical data analysis.
The clustering technique can be widely used in various tasks. Some most common
uses of this technique are:

12
• Market Segmentation
• Statistical data analysis
• Social network analysis
• Image segmentation
• Anomaly detection, etc.
Apart from these general usages, it is used by the Amazon in its recommendation
system to provide the recommendations as per the past search of products. Netflix
also uses this technique to recommend the movies and web-series to its users as
per the watch history.
The below diagram explains the working of the clustering algorithm. We can see
the different fruits are divided into several groups with similar properties.

13
TYPES OF CLUSTERING:
The clustering methods are broadly divided into Hard clustering (datapoint
belongs to only one group) and Soft Clustering (data points can belong to another
group also). But there are also other various approaches of Clustering exist. Below
are the main clustering methods used in Machine learning:
• Partitioning Clustering
• Density-Based Clustering
• Distribution Model-Based Clustering
• Hierarchical Clustering
• Fuzzy Clustering
Partitioning Clustering
It is a type of clustering that divides the data into non-hierarchical groups. It is also
known as the centroid-based method. The most common example of partitioning
clustering is the K-Means Clustering algorithm.
In this type, the dataset is divided into a set of k groups, where K is used to define
the number of pre-defined groups. The cluster center is created in such a way that
the distance between the data points of one cluster is minimum as compared to
another cluster centroid.

Density -Based Clustering


The density-based clustering method connects the highly-dense areas into clusters,
and the arbitrarily shaped distributions are formed as long as the dense region can
be connected. This algorithm does it by identifying different clusters in the dataset
and connects the areas of high densities into clusters. The dense areas in data space
are divided from each other by sparser areas.
These algorithms can face difficulty in clustering the data points if the dataset has
varying densities and high dimensions.

14
The distribution model-based clustering method, the data is divided based on the
probability of how a dataset belongs to a particular distribution. The grouping is
done by assuming some distributions commonly Gaussian Distribution.
The example of this type is the Expectation-Maximization Clustering algorithm that
uses Gaussian Mixture Models (GMM).
Distribution Model-Based Clustering:

In the distribution model-based clustering method, the data is divided based


on the probability of how a dataset belongs to a particular distribution. The
grouping is done by assuming some distributions commonly Gaussian
Distribution.

The example of this type is the Expectation-Maximization Clustering


algorithm that uses Gaussian Mixture Models (GMM).

Hierarchical Clustering:

Hierarchical clustering can be used as an alternative for the partitioned


clustering as there is no requirement of pre-specifying the number of clusters
to be created. In this technique, the dataset is divided into clusters to create
a tree-like structure, which is also called a dendrogram. The observations or
any number of clusters can be selected by cutting the tree at the correct level.
The most common example of this method is the Agglomerative Hierarchical
algorithm.

Fuzzy Clustering:

Fuzzy clustering is a type of soft method in which a data object may belong
to more than one group or cluster. Each dataset has a set of membership
coefficients, which depend on the degree of membership to be in a cluster.
Fuzzy C-means algorithm is the example of this type of clustering; it is
sometimes also known as the Fuzzy k-means algorithm.

15
9. Clustering Algorithms:
The Clustering algorithms can be divided based on their models that are
explained above. There are different types of clustering algorithms published,
but only a few are commonly used. The clustering algorithm is based on the
kind of data that we are using. Such as, some algorithms need to guess the
number of clusters in the given dataset, whereas some are required to find
the minimum distance between the observation of the dataset.

Here we are discussing mainly popular Clustering algorithms that are widely
used in machine learning:

K-Means algorithm: The k-means algorithm is one of the most popular


clustering algorithms. It classifies the dataset by dividing the samples into
different clusters of equal variances. The number of clusters must be specified
in this algorithm. It is fast with fewer computations required, with the linear
complexity of O(n).

Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in


the smooth density of data points. It is an example of a centroid-based model,
that works on updating the candidates for centroid to be the center of the
points within a given region.

DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of


Applications with Noise. It is an example of a density-based model similar to
the mean-shift, but with some remarkable advantages. In this algorithm, the
areas of high density are separated by the areas of low density. Because of
this, the clusters can be found in any arbitrary shape.

Expectation-Maximization Clustering using GMM: This algorithm can be


used as an alternative for the k-means algorithm or for those cases where K-
means can be failed. In GMM, it is assumed that the data points are Gaussian
distributed.

Agglomerative Hierarchical algorithm: The Agglomerative hierarchical


algorithm performs the bottom-up hierarchical clustering. In this, each data
point is treated as a single cluster at the outset and then successively merged.
The cluster hierarchy can be represented as a tree-structure.

16
Affinity Propagation: It is different from other clustering algorithms as it
does not require to specify the number of clusters. In this, each data point
sends a message between the pair of data points until convergence. It has
O(N2T) time complexity, which is the main drawback of this algorithm.

10. DATA PROCESSING:


Data Processing is the task of converting data from a given form to a much
more usable and desired form i.e. making it more meaningful and informative.
Using Machine Learning algorithms, mathematical modeling, and statistical
knowledge, this entire process can be automated. The output of this complete
process can be in any desired form like graphs, videos, charts, tables, images,
and many more, depending on the task we are performing and the
requirements of the machine. This might seem to be simple but when it comes
to massive organizations like Twitter, Facebook, Administrative bodies like
Parliament, UNESCO, and health sector organizations, this entire process
needs to be performed in a very structured manner. So, the steps to perform
are as follows:

17
Collection :
The most crucial step when starting with ML is to have data of good quality and
accuracy. Data can be collected from any authenticated source
like Kaggle or UCI dataset repository. For example, while preparing for a
competitive exam, students study from the best study material that they can
access so that they learn the best to obtain the best results. In the same way,
high-quality and accurate data will make the learning process of the model
easier and better and at the time of testing, the model would yield state-of-
the-art results. A huge amount of capital, time and resources are consumed in
collecting data. Organizations or researchers have to decide what kind of data
they need to execute their tasks or research.
Example: Working on the Facial Expression Recognizer, needs numerous
images having a variety of human expressions. Good data ensures that the
results of the model are valid and can be trusted upon.
Preparation
The collected data can be in a raw form which can’t be directly fed to the
machine. So, this is a process of collecting datasets from different sources,
analyzing these datasets and then constructing a new dataset for further
processing and exploration. This preparation can be performed either
manually or from the automatic approach. Data can also be prepared in
numeric forms also which would fasten the model’s learning.
Example: An image can be converted to a matrix of N X N dimensions, the value
of each cell will indicate the image pixel.

Input
Now the prepared data can be in the form that may not be machine-readable,
so to convert this data to the readable form, some conversion algorithms are
needed. For this task to be executed, high computation and accuracy is needed.
Example: Data can be collected through the sources like MNIST Digit
data(images), Twitter comments, audio files, video clips.

Processing
This is the stage where algorithms and ML techniques are required to perform
the instructions provided over a large volume of data with accuracy and
optimal computation.

18
Output
In this stage, results are procured by the machine in a meaningful manner
which can be inferred easily by the user. Output can be in the form of reports,
graphs, videos, etc.

Storage
This is the final step in which the obtained output and the data model data
and all the useful information are saved for future use.

11. REINFORCEMENT LEARNING :


Reinforcement learning is an area of Machine Learning. It is about taking
suitable action to maximize reward in a particular situation. It is employed
by various software and machines to find the best possible behavior or path
it should take in a specific situation. Reinforcement learning differs from
supervised learning in a way that in supervised learning the training data has
the answer key with it so the model is trained with the correct answer itself
whereas in reinforcement learning, there is no answer but the reinforcement
agent decides what to do to perform the given task. In the absence of a
training dataset, it is bound to learn from its experience.

Example: The problem is as follows: We have an agent and a reward, with


many hurdles in between. The agent is supposed to find the best possible
path to reach the reward. The following problem explains the problem more
easily.

The above image shows the robot, diamond, and fire. The goal of the robot
is to get the reward that is the diamond and avoid the hurdles that are fired.
The robot learns by trying all the possible paths and then choosing the path
which gives him the reward with the least hurdles. Each right step will give
the robot a reward and each wrong step will subtract the reward of the
robot. The total reward will be calculated when it reaches the final reward
that is the diamond.

19
Main points in Reinforcement learning –

• Input: The input should be an initial state from which the model will
start
• Output: There are many possible outputs as there are a variety of
solutions to a particular problem
• Training: The training is based upon the input, The model will return
a state and the user will decide to reward or punish the model based
on its output.
• The model keeps continues to learn.
• The best solution is decided based on the maximum reward.

Types of Reinforcement:
There are two types of Reinforcement:

Positive –

Positive Reinforcement is defined as when an event, occurs due to a


particular behavior, increases the strength and the frequency of the
behavior. In other words, it has a positive effect on behavior.

Advantages of reinforcement learning are:

• Maximizes Performance
• Sustain Change for a long period of time
• Too much Reinforcement can lead to an overload of states
which can diminish the results

Negative –

Negative Reinforcement is defined as strengthening of behavior because


a negative condition is stopped or avoided.

20
Advantages of reinforcement learning:

• Increases Behavior
• Provide defiance to a minimum standard of performance
• It Only provides enough to meet up the minimum behavior

Various Practical applications of Reinforcement Learning –

• RL can be used in robotics for industrial automation.


• RL can be used in machine learning and data processing
• RL can be used to create training systems that provide custom
instruction and materials according to the requirement of
students.

RL can be used in large environments in the following situations:

• A model of the environment is known, but an analytic solution


is not available;
• Only a simulation model of the environment is given (the
subject of simulation-based optimization)
• The only way to collect information about the environment is to
interact with it.

12.INTRODUCTION TO DIMENSIONALITY REDUCTION:


In machine learning classification problems, there are often too
many factors on the basis of which the final classification is done.
These factors are basically variables called features. The higher the
number of features, the harder it gets to visualize the training set
and then work on it. Sometimes, most of these features are
correlated, and hence redundant. This is where dimensionality
reduction algorithms come into play. Dimensionality reduction is the
process of reducing the number of random variables under
consideration, by obtaining a set of principal variables. It can be
divided into feature selection and feature extraction.

21
Why is Dimensionality Reduction important in Machine Learning
and Predictive Modeling?

An intuitive example of dimensionality reduction can be discussed


through a simple e-mail classification problem, where we need to
classify whether the e-mail is spam or not. This can involve a large
number of features, such as whether or not the e-mail has a generic
title, the content of the e-mail, whether the e-mail uses a template,
etc. However, some of these features may overlap. In another
condition, a classification problem that relies on both humidity and
rainfall can be collapsed into just one underlying feature, since both
of the aforementioned are correlated to a high degree. Hence, we
can reduce the number of features in such problems. A 3-D
classification problem can be hard to visualize, whereas a 2-D one
can be mapped to a simple 2 dimensional space, and a 1-D problem
to a simple line. The below figure illustrates this concept, where a 3-
D feature space is split into two 2-D feature spaces, and later, if
found to be correlated, the number of features can be reduced even
further.

Components of Dimensionality Reduction

There are two components of dimensionality reduction:

Feature selection: In this, we try to find a subset of the original set


of variables, or features, to get a smaller subset which can be used
to model the problem. It usually involves three ways:

• Filter
• Wrapper
• Embedded

22
Feature extraction: This reduces the data in a high dimensional
space to a lower dimension space, i.e. a space with lesser no. of
dimensions.

Methods of Dimensionality Reduction:

The various methods used for dimensionality reduction include:

• Principal Component Analysis (PCA)


• Linear Discriminant Analysis (LDA)
• Generalized Discriminant Analysis (GDA)

Dimensionality reduction may be both linear or non-linear,


depending upon the method used. The prime linear method, called
Principal Component Analysis, or PCA, is discussed below.

Principal Component Analysis

This method was introduced by Karl Pearson. It works on a condition


that while the data in a higher dimensional space is mapped to data
in a lower dimension space, the variance of the data in the lower
dimensional space should be maximum.

It involves the following steps:

• Construct the covariance matrix of the data.


• Compute the eigenvectors of this matrix.
• Eigenvectors corresponding to the largest eigenvalues are
used to reconstruct a large fraction of variance of the
original data.

Hence, we are left with a lesser number of eigenvectors, and there


might have been some data loss in the process. But, the most
important variances should be retained by the remaining
eigenvectors.

23
Advantages of Dimensionality Reduction:

• It helps in data compression, and hence reduced storage


space.
• It reduces computation time.
• It also helps remove redundant features, if any.

Disadvantages of Dimensionality Reduction:

• It may lead to some amount of data loss.


• PCA tends to find linear correlations between variables,
which is sometimes undesirable.
• PCA fails in cases where mean and covariance are not
enough to define datasets.
• We may not know how many principal components to
keep- in practice, some thumb rules are applied.

Why do we prefer Python to implement machine


learning algorithms?
Python is a popular and general-purpose programming
language. We can write machine learning algorithms using
Python, and it works well. The reason why Python is so popular
among data scientists is that Python has a diverse variety of
modules and libraries already implemented that make our life
more comfortable.

Let us have a brief look at some exciting Python libraries.

Numpy: It is a math library to work with n-dimensional arrays in


Python. It enables us to do computations effectively and
efficiently.

Spicy: It is a collection of numerical algorithms and domain-


specific tool-box, including signal processing, optimization,

24
statistics, and much more. Scipy is a functional library for
scientific and high-performance computations.

Scikit-learn: It is a free machine learning library for python


programming language. It has most of the classification,
regression, and clustering algorithms, and works with Python
numerical libraries such as Numpy, Scipy.

Matplotlib: It is a trendy plotting package that provides 2D


plotting as well as 3D plotting.

13. STEP BY STEP IMPLEMENTATION IN PYTHON:

a. Import required libraries:

Since we are going to use various libraries for calculations, we need to


import them.

25
b. Read the CSV file:

We check the first five rows of our dataset. In this case, we are using a vehicle
model dataset — please check out the dataset on Softlayer IBM.

c. Select the features we want to consider in predicting values:

Here our goal is to predict the value of “co2 emissions” from the value
“engine size” in our dataset.

d. Plot the data:

We can visualize our data on a scatter plot.

26
e. Divide the data into training and testing data:

To check the accuracy of a model, we are going to divide our data into
training and testing datasets. We will use training data to train our model,
and then we will check the accuracy of our model using the testing dataset.

27
f. Training our model:

Here is how we can train our model and find the coefficients for our best-fit
regression line.

g. Plot the best fit line:

Based on the coefficients, we can plot the best fit line for our dataset.

28
h. Prediction function:

We are going to use a prediction function for our testing dataset.

29
i.Predicting co2 emissions:

Predicting the values of co2 emissions based on the regression line.

j. Checking accuracy for test data :

We can check the accuracy of a model by comparing the actual values with
the predicted values in our dataset.

30
Put it all together
#Import required libraries:

Import pandas as pd

Import numpy as np

Import matplotlib.pyplot as plt

From sklearn import linear_model

# Read the CSV file :

Data = pd.read_csv(“Fuel.csv”)

Data.head()

# Let’s select some features to explore more :

Data = data[[“ENGINESIZE”,”CO2EMISSIONS”]]

# ENGINESIZE vs CO2EMISSIONS:

Plt.scatter(data[“ENGINESIZE”] , data[“CO2EMISSIONS”] , color=”blue”)

Plt.xlabel(“ENGINESIZE”)

Plt.ylabel(“CO2EMISSIONS”)

Plt.show()

# Generating training and testing data from our data:

31
# We are using 80% data for training.

Train = data[int((len(data)*0.8)))]

Test = data[(int((len(data)*0.8))):]

# Modeling:

# Using sklearn package to model data :

Regr = linear_model.LinearRegression()

Train_x = np.array(train[[“ENGINESIZE”]])

Train_y = np.array(train[[“CO2EMISSIONS”]])

Regr.fit(train_x,train_y)

# The coefficients:

Print (“coefficients : “,regr.coef_) #Slope

Print (“Intercept : “,regr.intercept_) #Intercept

# Plotting the regression line:

Plt.scatter(train[“ENGINESIZE”], train[“CO2EMISSIONS”], color=’blue’)

Plt.plot(train_x, regr.coef_*train_x + regr.intercept_, ‘-r’)

Plt.xlabel(“Engine size”)

Plt.ylabel(“Emission”)

# Predicting values:

# Function for predicting future values :

32
Def get_regression_predictions(input_features,intercept,slope):

Predicted_values = input_features*slope + intercept

Return predicted_values

# Predicting emission for future car:

My_engine_size = 3.5

Estimatd_emission =
get_regression_predictions(my_engine_size,regr.intercept_[0],regr.coef_[0
][0])

Print (“Estimated Emission :”,estimatd_emission)

# Checking various accuracy:

From sklearn.metrics import r2_score

Test_x = np.array(test[[‘ENGINESIZE’]])

Test_x = np.array(test[[‘CO2EMISSIONS’]])

Test_y_ = regr.predict(test_x)

Print(“Mean absolute error: %.2f” % np.mean(np.absolute(test_y_ —


test_y)))

Print(“Mean sum of squares (MSE): %.2f” % np.mean((test_y_ — test_y) **


2))

Print(“R2-score: %.2f” % r2_score(test_y_ , test_y) )

33

You might also like