Merge +1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 107

Bias and Variance in Machine Learning

Machine learning is a branch of Artificial Intelligence, which allows machines to perform data
analysis and make predictions. However, if the machine learning model is not accurate, it can
make predictions errors, and these prediction errors are usually known as Bias and Variance. In
machine learning, these errors will always be present as there is always a slight difference
between the model predictions and actual predictions. The main aim of ML/data science analysts
is to reduce these errors in order to get more accurate results. In this topic, we are going to
discuss bias and variance, Bias-variance trade-off, Underfitting and Overfitting. But before
starting, let's first understand what errors in Machine learning are?

Errors in Machine Learning?


In machine learning, an error is a measure of how accurately an algorithm can make predictions
for the previously unknown dataset. On the basis of these errors, the machine learning model is
selected that can perform best on the particular dataset. There are mainly two types of errors in
machine learning, which are:
o Reducible errors: These errors can be reduced to improve the model accuracy. Such errors can further be
classified into bias and Variance.

o Irreducible errors: These errors will always be present in the model

regardless of which algorithm has been used. The cause of these errors is unknown variables
whose value can't be reduced.

What is Bias?
In general, a machine learning model analyses the data, find patterns in it and make predictions.
While training, the model learns these patterns in the dataset and applies them to test data for
prediction. While making predictions, a difference occurs between prediction values
made by the model and actual values/expected values, and this difference is
known as bias errors or Errors due to bias. It can be defined as an inability of machine
learning algorithms such as Linear Regression to capture the true relationship between the data
points. Each algorithm begins with some amount of bias because bias occurs from assumptions
in the model, which makes the target function simple to learn. A model has either:

x
o Low Bias: A low bias model will make fewer assumptions about the form of the target function.
o High Bias: A model with a high bias makes more assumptions, and the model becomes unable to capture
the important features of our dataset. A high bias model also cannot perform well on new data.

Generally, a linear algorithm has a high bias, as it makes them learn fast. The simpler the
algorithm, the higher the bias it has likely to be introduced. Whereas a nonlinear algorithm often
has low bias.
Some examples of machine learning algorithms with low bias are Decision Trees, k-Nearest
Neighbours and Support Vector Machines. At the same time, an algorithm with high bias
is Linear Regression, Linear Discriminant Analysis and Logistic Regression.

Ways to reduce High Bias:

High bias mainly occurs due to a much simple model. Below are some ways to reduce the high
bias:

o Increase the input features as the model is underfitted.


o Decrease the regularization term.
o Use more complex models, such as including some polynomial features.

What is a Variance Error?


The variance would specify the amount of variation in the prediction if the different training data
was used. In simple words, variance tells that how much a random variable is different
from its expected value. Ideally, a model should not vary too much from one training dataset
to another, which means the algorithm should be good in understanding the hidden mapping
between inputs and output variables. Variance errors are either of low variance or high
variance.

Low variance means there is a small variation in the prediction of the target function with
changes in the training data set. At the same time, High variance shows a large variation in the
prediction of the target function with changes in the training dataset.

A model that shows high variance learns a lot and perform well with the training dataset, and
does not generalize well with the unseen dataset. As a result, such a model gives good results
with the training dataset but shows high error rates on the test dataset.

Since, with high variance, the model learns too much from the dataset, it leads to overfitting of
the model. A model with high variance has the below problems:

o A high variance model leads to overfitting.


o Increase model complexities.

Usually, nonlinear algorithms have a lot of flexibility to fit the model, have high variance.
Some examples of machine learning algorithms with low variance are, Linear Regression,
Logistic Regression, and Linear discriminant analysis. At the same time, algorithms with
high variance are decision tree, Support Vector Machine, and K-nearest neighbours.

Ways to Reduce High Variance:

o Reduce the input features or number of parameters as a model is overfitted.


o Do not use a much complex model.
o Increase the training data.
o Increase the Regularization term.

Different Combinations of Bias-Variance


There are four possible combinations of bias and variances, which are represented by the below
diagram:
1. Low-Bias, Low-Variance:
The combination of low bias and low variance shows an ideal machine learning model.
However, it is not possible practically.
2. Low-Bias, High-Variance: With low bias and high variance, model predictions are
inconsistent and accurate on average. This case occurs when the model learns with a
large number of parameters and hence leads to an overfitting
3. High-Bias, Low-Variance: With High bias and low variance, predictions are consistent
but inaccurate on average. This case occurs when a model does not learn well with the
training dataset or uses few numbers of the parameter. It leads to underfitting problems
in the model.
4. High-Bias, High-Variance:
With high bias and high variance, predictions are inconsistent and also inaccurate on
average.

How to identify High variance or High Bias?


High variance can be identified if the model has:
o Low training error and high test error.

High Bias can be identified if the model has:

o High training error and the test error is almost similar to training error.

Bias-Variance Trade-Off
While building the machine learning model, it is really important to take care of bias and
variance in order to avoid overfitting and underfitting in the model. If the model is very simple
with fewer parameters, it may have low variance and high bias. Whereas, if the model has a large
number of parameters, it will have high variance and low bias. So, it is required to make a
balance between bias and variance errors, and this balance between the bias error and variance
error is known as the Bias-Variance trade-off.
For an accurate prediction of the model, algorithms need a low variance and low bias. But this is
not possible because bias and variance are related to each other:

o If we decrease the variance, it will increase the bias.


o If we decrease the bias, it will increase the variance.

Bias-Variance trade-off is a central issue in supervised learning. Ideally, we need a model that
accurately captures the regularities in training data and simultaneously generalizes well with the
unseen dataset. Unfortunately, doing this is not possible simultaneously. Because a high variance
algorithm may perform well with training data, but it may lead to overfitting to noisy data.
Whereas, high bias algorithm generates a much simple model that may not even capture
important regularities in the data. So, we need to find a sweet spot between bias and variance to
make an optimal model.

Hence, the Bias-Variance trade-off is about finding the sweet spot to make a balance between bias and
variance errors.
MACHINE LEARNING
PROGRAM: B.TECH (CSE-DATA SCIENCE)

SEM-V

TAKEN BY: PROF. SHWETA LOONKAR

[email protected]
Syllabus Unit Description Duration
1 Introduction: What is Machine Learning. Supervised Learning. Unsupervised Learning 2

2 Linear Model Selection and Regularization: Linear regression. Hypothesis 8


representation. Gradient descent. Cost function. Linear regression with multiple variables.
Polynomial regression. Logistic
regression. Hypothesis representation. Gradient descent. Cost function.
Linear regression with multiple variables. Normal Equation.
Polynomial regression. Regularization.

3 Moving Beyond Linearity: Neural networks. Hypothesis representation. Cost function. 5


Back propagation. Activation function.
4 Machine Learning System Design: Evaluating hypothesis. Train – Validation – Test. Bias 2
and variance curves. Error analysis. Error metrics for skewed classes. Precision and bias
tradeoff.
5 Tree-Based Methods: The Basics of Decision Trees, Regression Trees, Classification 4
Trees, Trees Versus Linear Models, Advantages and Disadvantages of Trees, Bagging,
Random Forests, Boosting
6 Support Vector Machines: Maximal Margin Classifier, Support Vector Classifiers, Support 4
Vector Machines, SVMs with More than Two
Classes, Relationship to Logistic Regression, ROC Curves, Application
to Gene Expression Data

7 Unsupervised Learning: The Challenge of Unsupervised Learning, Principal Components 5


Analysis, Clustering Methods, K-Means
Clustering, Hierarchical Clustering, Anomaly detection and large scale
machine learning.

Total 30
Teaching and Evaluation Scheme
Program: B. Tech. CSDS Semester : II
Course/Module : Machine Learning Module Code:

Teaching Scheme Evaluation Scheme


Term End Examinations
Lecture (Hours Practical (Hours Tutorial Credit Internal (TEE)
per week) per week) (Hours per Continuous (Marks- 100 in Question
week) Assessment (ICA) Paper)
(Marks - 50)

2 2 0 3 Marks Scaled to Marks Scaled


50 to 50
What is Learning??
• Learning is a process that improves
the knowledge of an AI program by
making observations about its
environment.
• To understand the different types of AI
learning models, we can use two of the
main elements of human learning
processes:
• Knowledge- From the knowledge
perspective, learning models can be
classified based on the representation of
input and output data points.
• Feedback- AI learning models can be
classified based on the interactions with the
outside environment, users and other
external factors.

Difference Between AI, ML and DL


Existence of AI, ML and Deep Learning
What is Machine Learning??
Machine learning (ML) is a branch of artificial intelligence (AI) that enables computers to “self-
learn” from training data and improve over time, without being explicitly programmed. Machine
learning algorithms are able to detect patterns in data and learn from them, in order to make
their own predictions.
State of the Art Applications for ML
What are the steps involved in building Machine
Learning models?
Any machine learning model development can broadly be divided into six
steps:
•Problem definition involves converting a Business Problem to a
machine learning problem
•Hypothesis generation is the process of creating a possible
business hypothesis and potential features for the model
•Data Collection requires you to collect the data for testing your
hypothesis and building the model
•Data Exploration and cleaning helps you remove outliers, missing
values and then transform the data into the required format
•Modeling is where you actually build the machine learning
models
•Once built, you will deploy the models
Supervised Learning
• Supervised learning, as the name indicates, has the presence of a supervisor as a teacher.
• Basically supervised learning is when we teach or train the machine using data that is well labeled.
• Means some data is already tagged with the correct answer.
• After that, the machine is provided with a new set of examples(data) so that the supervised learning algorithm
analyses the training data(set of training examples) and produces a correct outcome from labeled data.
Unit-1 Types of Learning.docx
Supervised Learning Process: Two Steps

 Learning (training): Learn a model using the training data

 Testing: Test the model using unseen test data to assess the model accuracy

Number of correct classifications


Accuracy  ,
Total number of test cases

CS583, BING
20
LIU, UIC
What do we mean by Learning?
• Given

• a data set D,

• a task T, and

• a performance measure M,

a computer system is said to learn from D to perform the task T if after learning the system’s
performance on T improves as measured by M.

• In other words, the learned model helps the system to perform T better as compared to no
learning.
An Example
• Data: Loan application data
• Task: Predict whether a loan should be approved or not.
• Performance measure: accuracy.

No learning: classify all future applications (test data) to the majority class (i.e., Yes):
Accuracy = 9/15 = 60%.
• We can do better than 60% with learning.
Fundamental Assumption of Learning
Assumption: The distribution of training examples is identical to the distribution of test
examples (including future unseen examples).

• In practice, this assumption is often violated to certain degree.


• Strong violations will clearly result in poor classification accuracy.
• To achieve good accuracy on the test data, training examples must be sufficiently
representative of the test data.
Steps Involved in Supervised Learning:
•First Determine the type of training dataset
•Collect/Gather the labelled training data.
•Split the training dataset into training dataset, test dataset, and validation dataset.
•Determine the input features of the training dataset, which should have enough knowledge so that the
model can accurately predict the output.
•Determine the suitable algorithm for the model, such as support vector machine, decision tree, etc.
•Execute the algorithm on the training dataset. Sometimes we need validation sets as the control
parameters, which are the subset of training datasets.
•Evaluate the accuracy of the model by providing the test set. If the model predicts the correct output, which
means our model is accurate.
Types of Supervised Learning
• Supervised learning can be further divided into two types of problems:
Unsupervised Learning
• Unsupervised learning is the training of a machine using information that is neither classified
nor labeled.
• It allows the algorithm to act on that information without guidance.
• Here the task of the machine is to group unsorted information according to similarities, patterns,
and differences without any prior training of data.
• Unlike supervised learning, no teacher is provided that means no training will be given to the
machine. Therefore the machine is restricted to find the hidden structure in unlabeled data by
itself.
Reinforcement Learning
• Reinforcement learning is an area of Machine Learning.
• It is about taking suitable action to maximize reward in a particular situation.
• It is employed by various software and machines to find the best possible behavior or path it
should take in a specific situation.
• Reinforcement learning differs from supervised learning in a way that in supervised learning
the training data has the answer key with it so the model is trained with the correct answer
itself.
• Whereas in reinforcement learning, there is no answer but the reinforcement agent decides
what to do to perform the given task. In the absence of a training dataset, it is bound to learn
from its experience.
Terminologies Used in Reinforcement Learning

Agent – is the sole decision-maker and learner


Environment – a physical world where an agent learns and decides the actions to be performed
Action – a list of action which an agent can perform
State – the current situation of the agent in the environment
Reward – For each selected action by agent, the environment gives a reward. It’s usually a scalar value
and nothing but feedback from the environment
Policy – the agent prepares strategy(decision-making) to map situations to actions.
Value Function – The value of state shows up the reward achieved starting from the state until the
policy is executed
Model – Every RL agent doesn’t use a model of its environment. The agent’s view maps state-action pairs
probability distributions over the states
Reinforcement Learning Workflow

Reinforcement Learning Workflow

– Create the Environment


– Define the reward
– Create the agent
– Train and validate the agent
– Deploy the policy
Semi Supervised Learning
• Where an incomplete training signal is given: a training set with some (often many) of the
target outputs missing.
• There is a special case of this principle known as Transduction where the entire set of problem
instances is known at learning time, except that part of the targets are missing.
• Semi-supervised learning is an approach to machine learning that combines small labeled
data with a large amount of unlabeled data during training. Semi-supervised learning falls
between unsupervised learning and supervised learning.
What are some of the latest achievements and
developments in machine learning?
Some of the latest achievements of machine learning include:
•Winning DOTA2 against the professional players (OpenAI’s development)
• Beating Lee Sidol at the traditional game of GO (Google DeepMind’s algorithm)
• Google saving up to 40% of electricity in its data centers by using Machine Learning
• Writing entire essays and poetry, and creating movies from scratch using Natural Language
Processing (NLP) techniques (Multiple breakthroughs, the latest being OpenAI’s GPT-2)
• Creating and generating images and videos from scratch (this is both incredibly creative and
worryingly accurate)
• Building automated machine learning models. This is revolutionizing the field by expanding the
circle of people who can work with machine learning to include non-technical folks as well
• Building machine learning models in the browser itself! (A Google creation – TensorFlow.js)
What are some of the Challenges in the
ad0ption of Machine Learning?
While machine learning has made tremendous progress in the last few years, there are some big challenges that
still need to be solved. It is an area of active research and I expect a lot of effort to solve these problems in the
coming time.
Huge data required: It takes a huge amount of data to train a model today. For example – if you want to
classify Cats vs. Dogs based on images (and you don’t use an existing model) – you would need the model to be
trained on thousands of images. Compare that to a human – we typically explain the difference between Cat and
Dog to a child by using 2 or 3 photos
High compute required: As of now, machine learning and deep learning models require huge computations
to achieve simple tasks (simple according to humans). This is why the use of special hardware including GPUs
and TPUs is required. The cost of computations needs to come down for machine learning to make a next-level
impact
Interpretation of models is difficult at times: Some modeling techniques can give us high accuracy but
are difficult to explain. This can leave the business owners frustrated. Imagine being a bank, but you cannot tell
why you declined a loan for a customer!
New and better algorithms required: Researchers are consistently looking out for new and better
algorithms to address some of the problems mentioned above
More Data Scientists needed: Further, since the domain has grown so quickly – there aren’t many people
with the skill sets required to solve the vast variety of problems. This is expected to remain so for the next few
years. So, if you are thinking about building a career in machine learning – you are in good stead!
Types of Learning
For instance, suppose you are given a basket filled with different kinds of fruits. Now the first
step is to train the machine with all the different fruits one by one like this:

• If the shape of the object is rounded and has a depression at the top, is red in color, then it
will be labeled as –Apple.
• If the shape of the object is a long curving cylinder having Green-Yellow color, then it will
be labeled as –Banana.
Now suppose after training the data, you have given a new separate fruit, say Banana from the
basket, and asked to identify it.

Since the machine has already learned the things from previous data and this time has to use it
wisely. It will first classify the fruit with its shape and color and would confirm the fruit name
as BANANA and put it in the Banana category. Thus the machine learns the things from
training data(basket containing fruits) and then applies the knowledge to test data(new fruit).
Supervised learning is classified into two categories of algorithms:
• Classification: A classification problem is when the output variable is a category, such as
“Red” or “blue” , “disease” or “no disease”.
• Regression: A regression problem is when the output variable is a real value, such as
“dollars” or “weight”.
Supervised learning deals with or learns with “labeled” data. This implies that some data is
already tagged with the correct answer.
Steps

How Supervised Learning Works?

In supervised learning, models are trained using labelled dataset, where the model learns about
each type of data. Once the training process is completed, the model is tested on the basis of test
data (a subset of the training set), and then it predicts the output.

The working of Supervised learning can be easily understood by the below example and
diagram:

Suppose we have a dataset of different types of shapes which includes square, rectangle, triangle,
and Polygon. Now the first step is that we need to train the model for each shape.

o If the given shape has four sides, and all the sides are equal, then it will be labelled as
a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides, then it will be labelled as hexagon.

Now, after training, we test our model using the test set, and the task of the model is to identify
the shape.
The machine is already trained on all types of shapes, and when it finds a new shape, it classifies
the shape on the bases of a number of sides, and predicts the output.

1. Regression

Regression algorithms are used if there is a relationship between the input variable and the
output variable. It is used for the prediction of continuous variables, such as Weather forecasting,
Market Trends, etc. Below are some popular Regression algorithms which come under
supervised learning:

o Linear Regression
o Regression Trees
o Non-Linear Regression
o Bayesian Linear Regression
o Polynomial Regression

2. Classification

Classification algorithms are used when the output variable is categorical, which means there are
two classes such as Yes-No, Male-Female, True-false, etc.

Spam Filtering,

o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines

Advantages of Supervised learning:

o With the help of supervised learning, the model can predict the output on the basis of
prior experiences.
o In supervised learning, we can have an exact idea about the classes of objects.
o Supervised learning model helps us to solve various real-world problems such as fraud
detection, spam filtering, etc.
o Supervised learning allows collecting data and produces data output from previous
experiences.
o Helps to optimize performance criteria with the help of experience.
o Supervised machine learning helps to solve various types of real-world computation
problems.

Disadvantages of supervised learning:

o Supervised learning models are not suitable for handling complex tasks.
o Supervised learning cannot predict the correct output if the test data is different from the
training dataset.
o Training required lots of computation times.
o In supervised learning, we need enough knowledge about the classes of object.
o Classifying big data can be challenging.

Unsupervised

For instance, suppose it is given an image having both dogs and cats which it has never seen.

Thus the machine has no idea about the features of dogs and cats so we can’t categorize it as
‘dogs and cats ‘. But it can categorize them according to their similarities, patterns, and
differences, i.e., we can easily categorize the above picture into two parts. The first may
contain all pics having dogs in them and the second part may contain all pics having cats in
them. Here you didn’t learn anything before, which means no training data or examples.
It allows the model to work on its own to discover patterns and information that was
previously undetected. It mainly deals with unlabelled data.
Unsupervised learning is classified into two categories of algorithms:
• Clustering: A clustering problem is where you want to discover the inherent groupings in
the data, such as grouping customers by purchasing behavior.
• Association: An association rule learning problem is where you want to discover rules that
describe large portions of your data, such as people that buy X also tend to buy Y.
Types of Unsupervised Learning:-
Clustering
1. Exclusive (partitioning)
2. Agglomerative
3. Overlapping
4. Probabilistic
Clustering Types:-
1. Hierarchical clustering
2. K-means clustering
3. Principal Component Analysis
4. Singular Value Decomposition
5. Independent Component Analysis
Supervised vs. Unsupervised Machine Learning
Parameters Supervised machine learning Unsupervised machine learning

Algorithms are trained using Algorithms are used against data


Input Data labeled data. that is not labeled

Computational
Complexity Simpler method Computationally complex

Accuracy Highly accurate Less accurate

No. of classes No. of classes is known No. of classes is not known

Data Analysis Uses offline analysis Uses real-time analysis of data

Linear and Logistics regression,


Random forest, K-Means clustering, Hierarchical
Support Vector Machine, Neural clustering,

Algorithms used Network, etc. Apriori algorithm, etc.


Unit-2 Linear Regression Numericals
Linear regression is the most basic and commonly used predictive analysis. One variable is considered to
be an explanatory variable, and the other is considered to be a dependent variable. For example, a modeler
might want to relate the weights of individuals to their heights using a linear regression model.
There are several linear regression analyses available to the researcher.
Simple linear regression

• One dependent variable (interval or ratio)


• One independent variable (interval or ratio or dichotomous)
Multiple linear regression

• One dependent variable (interval or ratio)


• Two or more independent variables (interval or ratio or dichotomous)
Logistic regression

• One dependent variable (binary)


• Two or more independent variable(s) (interval or ratio or dichotomous)
Ordinal regression

• One dependent variable (ordinal)


• One or more independent variable(s) (nominal or dichotomous)
Multinomial regression

• One dependent variable (nominal)


• One or more independent variable(s) (interval or ratio or dichotomous)
Discriminant analysis

• One dependent variable (nominal)


• One or more independent variable(s) (interval or ratio)
Formula for linear regression equation is given by:
𝑦 = 𝑎 + 𝑏𝑥
a and b are given by the following formulas:

𝑛∑𝑥𝑦 − (∑𝑥)(∑𝑦)
𝑏(𝑠𝑙𝑜𝑝𝑒) =
𝑛∑𝑥 2 − (∑𝑥)2
Where,
x and y are two variables on the regression line.
b = Slope of the line.
a = y-intercept of the line.
x = Values of the first data set.
y = Values of the second data set.

Solved Examples
Question: Find linear regression equation for the following two sets of data:

x 2 4 6 8

y 3 7 5 10
Solution:
Construct the following table:

x y x2 xy

2 3 4 6

4 7 16 28

6 5 36 30

8 10 64 80

= 20 = 25 = 120 = 144
𝑛∑𝑥𝑦−(∑𝑥)(∑𝑦)
𝑏= 𝑛∑𝑥 2 −(∑𝑥)2
=
b = 0.95
∑𝑦∑𝑥 2 –∑𝑥∑𝑥𝑦
𝑎= 𝑛(∑𝑥 2 )–(∑𝑥)2

a = 1.5
Linear regression is given by:
y = a + bx
y = 1.5 + 0.95 x
Linear Regression
Problems with Solutions

Linear regression and modelling problems are presented along with their solutions at the bottom of the
page. Also a linear regression calculator and grapher may be used to check answers and create more
opportunities for practice.

Review
If the plot of n pairs of data (x , y) for an experiment appear to indicate a "linear relationship" between y
and x, then the method of least squares may be used to write a linear relationship between x and y.
The least squares regression line is the line that minimizes the sum of the squares (d1 + d2 + d3 + d4) of
the vertical deviation from each data point to the line (see figure below as an example of 4 points).

Figure 1. Linear regression where the sum of vertical distances d1 + d2 + d3 + d4 between observed and
predicted (line and its equation) values is minimized.

The least square regression line for the set of n data points is given by the equation of a line in slope
intercept form:

y=ax+b

where a and b are given by


Figure 2. Formulas for the constants a and b included in the linear regression .

• Problem 1

Consider the following set of points: {(-2 , -1) , (1 , 1) , (3 , 2)}


a) Find the least square regression line for the given data points.
b) Plot the given points and the regression line in the same rectangular system of axes.

• Problem 2

a) Find the least square regression line for the following set of data

{(-1 , 0),(0 , 2),(1 , 4),(2 , 5)}

b) Plot the given points and the regression line in the same rectangular system of axes.

• Problem 3

The values of y and their corresponding values of y are shown in the table below

x 0 1 2 3 4

y 2 3 5 4 6

a) Find the least square regression line y = a x + b.


b) Estimate the value of y when x = 10.

• Problem 4

The sales of a company (in million dollars) for each year are shown in the table below.

x (year) 2005 2006 2007 2008 2009


y (sales) 12 19 29 37 45

a) Find the least square regression line y = a x + b.

Solutions to the Above Problems

1. a) Let us organize the data in a table.

x y xy x2

-2 -1 2 4

1 1 1 1

3 2 6 9

Σx = 2 Σy = 2 Σxy = 9 Σx2 = 14

2.
We now use the above formula to calculate a and b as follows
a = (nΣx y - ΣxΣy) / (nΣx2 - (Σx)2) = (3*9 - 2*2) / (3*14 - 22) = 23/38

b = (1/n)(Σy - a Σx) = (1/3)(2 - (23/38)*2) = 5/19

b) We now graph the regression line given by y = a x + b and the given points.

3.

Figure 3. Graph of linear regression in problem 1.

4. a) We use a table as follows

x Y xy x2
-1 0 0 1

0 2 0 0

1 4 4 1

2 5 10 4

Σx = 2 Σy = 11 Σx y = 14 Σx2 = 6

We now use the above formula to calculate a and b as follows


a = (nΣx y - ΣxΣy) / (nΣx2 - (Σx)2) = (4*14 - 2*11) / (4*6 - 22) = 17/10 = 1.7

b = (1/n)(Σy - a Σx) = (1/4)(11 - 1.7*2) = 1.9

b) We now graph the regression line given by y = ax + b and the given points.

5.

Figure 4. Graph of linear regression in problem 2.

6. a) We use a table to calculate a and b.

x Y xy x2
0 2 0 0

1 3 3 1

2 5 10 4

3 4 12 9

4 6 24 16

Σx = 10 Σy = 20 Σx y = 49 Σx2 = 30

We now calculate a and b using the least square regression formulas for a and b.
a = (nΣx y - ΣxΣy) / (nΣx2 - (Σx)2) = (5*49 - 10*20) / (5*30 - 102) = 0.9

b = (1/n)(Σy - a Σx) = (1/5)(20 - 0.9*10) = 2.2

b) Now that we have the least square regression line y = 0.9 x + 2.2, substitute x by 10 to find the
value of the corresponding y.
y = 0.9 * 10 + 2.2 = 11.2

7. a) We first change the variable x into t such that t = x - 2005 and therefore t represents the
number of years after 2005. Using t instead of x makes the numbers smaller and therefore
manageable. The table of values becomes.

t (years after 2005) 0 1 2 3 4

y (sales) 12 19 29 37 45

We now use the table to calculate a and b included in the least regression line formula.

t Y ty t2

0 12 0 0

1 19 19 1

2 29 58 4

3 37 111 9

4 45 180 16
Σx = 10 Σy = 142 Σxy = 368 Σx2 = 30

We now calculate a and b using the least square regression formulas for a and b.
a = (nΣt y - ΣtΣy) / (nΣt2 - (Σt)2) = (5*368 - 10*142) / (5*30 - 102) = 8.4
b = (1/n)(Σy - a Σx) = (1/5)(142 - 8.4*10) = 11.6

b) In 2012, t = 2012 - 2005 = 7


The estimated sales in 2012 are: y = 8.4 * 7 + 11.6 = 70.4 million dollars.

Example 9.9

Calculate the regression coefficient and obtain the lines of regression for the following data

Solution:

Regression coefficient of X on Y
(i) Regression equation of X on Y

(ii) Regression coefficient of Y on X

(iii) Regression equation of Y on X


Y = 0.929X–3.716+11

= 0.929X+7.284

The regression equation of Y on X is Y= 0.929X + 7.284

Example 9.10

Calculate the two regression equations of X on Y and Y on X from the data given below, taking deviations
from a actual means of X and Y.

Estimate the likely demand when the price is Rs.20.

Solution:

Calculation of Regression equation

(i) Regression equation of X on Y


(ii) Regression Equation of Y on X

When X is 20, Y will be

= –0.25 (20)+44.25

= –5+44.25

= 39.25 (when the price is Rs. 20, the likely demand is 39.25)

Example 9.11

Obtain regression equation of Y on X and estimate Y when X=55 from the following

Solution:
(i) Regression coefficients of Y on X
(ii) Regression equation of Y on X

Y–51.57 = 0.942(X–48.29 )

Y = 0.942X–45.49+51.57=0.942 #–45.49+51.57

Y = 0.942X+6.08

The regression equation of Y on X is Y= 0.942X+6.08 Estimation of Y when X= 55

Y= 0.942(55)+6.08=57.89

Example 9.12

Find the means of X and Y variables and the coefficient of correlation between them from the following
two regression equations:

2Y–X–50 = 0

3Y–2X–10 = 0.

Solution:

We are given

2Y–X–50 = 0 ... (1)

3Y–2X–10 = 0 ... (2)

Solving equation (1) and (2)

We get Y = 90

Putting the value of Y in equation (1)

We get X = 130

Calculating correlation coefficient

Let us assume equation (1) be the regression equation of Y on X

2Y = X+50
Example 9.13

Find the means of X and Y variables and the coefficient of correlation between them from the following
two regression equations:

4X–5Y+33 = 0

20X–9Y–107 = 0

Solution:

We are given

4X–5Y+33 = 0 ... (1)

20X–9Y–107 =0 ... (2)

Solving equation (1) and (2)

We get Y = 17

Putting the value of Y in equation (1)

Calculating correlation coefficient

Let us assume equation (1) be the regression equation of X on Y


Let us assume equation (2) be the regression equation of Y on X

But this is not possible because both the regression coefficient are greater than

So our above assumption is wrong. Therefore treating equation (1) has regression equation of Y on X and
equation (2) has regression equation of X on Y . So we get

Example 9.16

For 5 pairs of observations the following results are obtained ∑X=15, ∑Y=25, ∑X2 =55, ∑Y2 =135,
∑XY=83 Find the equation of the lines of regression and estimate the value of X on the first line
when Y=12 and value of Y on the second line if X=8.

Solution:
Y–5 = 0.8(X–3)

= 0.8X+2.6

When X=8 the value of Y is estimated as


= 0.8(8)+2.6

=9
Unit-2

Regression Analysis in Machine learning

Regression analysis is a statistical method to model the relationship between a dependent (target) and
independent (predictor) variables with one or more independent variables. More specifically, Regression
analysis helps us to understand how the value of the dependent variable is changing corresponding to an
independent variable when other independent variables are held fixed. It predicts continuous/real values
such as temperature, age, salary, price, etc.

We can understand the concept of regression analysis using the below example:

Example: Suppose there is a marketing company A, who does various advertisement every year and get
sales on that. The below list shows the advertisement made by the company in the last 5 years and the
corresponding sales:

Now, the company wants to do the advertisement of $200 in the year 2019 and wants to know the
prediction about the sales for this year. So to solve such type of prediction problems in machine
learning, we need regression analysis.

Regression is a supervised learning technique which helps in finding the correlation between variables
and enables us to predict the continuous output variable based on the one or more predictor variables. It is
mainly used for prediction, forecasting, time series modeling, and determining the causal-effect
relationship between variables.

In Regression, we plot a graph between the variables which best fits the given datapoints, using this plot,
the machine learning model can make predictions about the data. In simple words, "Regression shows a
line or curve that passes through all the datapoints on target-predictor graph in such a way that the
vertical distance between the datapoints and the regression line is minimum." The distance between
datapoints and line tells whether a model has captured a strong relationship or not.

Some examples of regression can be as:


o Prediction of rain using temperature and other factors
o Determining Market trends
o Prediction of road accidents due to rash driving.

Terminologies Related to the Regression Analysis:


o Dependent Variable: The main factor in Regression analysis which we want to predict or
understand is called the dependent variable. It is also called target variable.
o Independent Variable: The factors which affect the dependent variables or which are used to
predict the values of the dependent variables are called independent variable, also called as
a predictor.
o Outliers: Outlier is an observation which contains either very low value or very high value in
comparison to other observed values. An outlier may hamper the result, so it should be avoided.
o Multicollinearity: If the independent variables are highly correlated with each other than other
variables, then such condition is called Multicollinearity. It should not be present in the dataset,
because it creates problem while ranking the most affecting variable.
o Underfitting and Overfitting: If our algorithm works well with the training dataset but not well
with test dataset, then such problem is called Overfitting. And if our algorithm does not perform
well even with training dataset, then such problem is called underfitting.

Why do we use Regression Analysis?

As mentioned above, Regression analysis helps in the prediction of a continuous variable. There are
various scenarios in the real world where we need some future predictions such as weather condition,
sales prediction, marketing trends, etc., for such case we need some technology which can make
predictions more accurately. So for such case we need Regression analysis which is a statistical method
and used in machine learning and data science. Below are some other reasons for using Regression
analysis:

o Regression estimates the relationship between the target and the independent variable.
o It is used to find the trends in data.
o It helps to predict real/continuous values.
o By performing the regression, we can confidently determine the most important factor, the
least important factor, and how each factor is affecting the other factors.

Types of Regression

There are various types of regressions which are used in data science and machine learning. Each type has
its own importance on different scenarios, but at the core, all the regression methods analyze the effect of
the independent variable on dependent variables. Here we are discussing some important types of
regression which are given below:

o Linear Regression
o Logistic Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression
o Ridge Regression
o Lasso Regression:
Linear Regression:
o Linear regression is a statistical regression method which is used for predictive analysis.
o It is one of the very simple and easy algorithms which works on regression and shows the
relationship between the continuous variables.
o It is used for solving the regression problem in machine learning.
o Linear regression shows the linear relationship between the independent variable (X-axis) and the
dependent variable (Y-axis), hence called linear regression.
o If there is only one input variable (x), then such linear regression is called simple linear
regression. And if there is more than one input variable, then such linear regression is
called multiple linear regression.
o The relationship between variables in the linear regression model can be explained using the
below image. Here we are predicting the salary of an employee on the basis of the year of
experience.
o Below is the mathematical equation for Linear regression:

1. Y= aX+b

Here, Y = dependent variables (target variables),


X= Independent variables (predictor variables),
a and b are the linear coefficients

Some popular applications of linear regression are:

o Analyzing trends and sales estimates


o Salary forecasting
o Real estate prediction
o Arriving at ETAs in traffic.

The linear regression model provides a sloped straight line representing the relationship between the
variables. Consider the below image:

Mathematically, we can represent a linear regression as:

y= a0+a1x+ ε

Here,

Y= Dependent Variable (Target Variable)


X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value).
ε = random error

The values for x and y variables are training datasets for Linear Regression model representation.

A Linear Regression model’s main aim is to find the best fit linear line and the optimal values of
intercept and coefficients such that the error is minimized.
In the above diagram,

• x is our dependent variable which is plotted on the x-axis and y is the dependent variable which is
plotted on the y-axis.

• Black dots are the data points i.e the actual values.

• bo is the intercept which is 10 and b1 is the slope of the x variable.

• The blue line is the best fit line predicted by the model i.e the predicted values lie on the blue line.

The vertical distance between the data point and the regression line is known as error or
residual. Each data point has one residual and the sum of all the differences is known as the Sum of
Residuals/Errors.

Mathematical Approach:

Residual/Error = Actual values — Predicted Values

Sum of Residuals/Errors = Sum(Actual- Predicted Values)

Square of Sum of Residuals/Errors = (Sum(Actual- Predicted Values))2

i.e
Linear Regression: Hypothesis Function, Cost Function, and Gradient Descent

For simplicity, we will first consider Linear Regression with only one variable:-

Model Representation:-

To describe the supervised learning problem slightly more formally, our goal is to, given a training set, to
learn a function h:X → Y, so that h(x) is a ‘good’ predictor for corresponding y. h(x) is known as
hypothesis function.
Now the picture might seem clear to you. Our main task is to design the h function.
What we are trying to achieve is that by plotting all the datasets on a graph with the input variables on the
independent axis and the output on the y or secondary axis. In this way, we would have a direct plotting of
input to output. For example:-

In the above example, we have data for different houses. For different land areas for the house, we have
different prices for those houses. This is our training data. Now sketch this dataset in the graph.

Next time, whenever I enter the area of a new house, it will automatically tell me the price of that house
using this line.
I entered the area=30, and it predicted the price of approximately 195 dollars for us. Which, according to
our training set, is a reasonable price. So how do you teach your computer to predict a line that fits your
dataset? Let’s dive into the mathematics:-

The Hypothesis Function:-

The hypothesis function for this case is:-


hθ = θ + θ1x

Don’t be overwhelmed if you are not familiar with that equation. Let me dive into the mathematics behind
this.
I thought that before considering the formula, you should have a reference to different terms used in this.
You might be familiar with the formula for a line using the slope and y-intercept.
y=mx+b
Refer to Khan Academy if you are not familiar with this equation

This equation is used to represent lines in the intercept form. Our hypothesis function is exactly the same
as the equation of a line.

So,theta1 is the slope(m) and theta0 is the intercept (b).Now, you have become familiar with the hypothesis
function and why we are using this function[ofcourse we want to fit a line into our graph, and this is the
equation of a line].

Squared Error Cost Function:-

At this stage, our primary goal is to minimize the difference between the line and each point. This is done
by tweaking the values of the slope of the line(theta1) and the y-intercept(theta0) of the line. So, we have
to find theta0 and theta1 for which the line has the smallest error.
What do I mean by minimum error? Let’s consider our above prediction.
The lines show the distance of each point from the line. When we sum up the difference for all the points,
it gives us the error in that line. So we have to minimize the error to gain an optimal solution. What we can
do is move the line a little bit higher, lower, change the angle by tweaking the values of theta0 and theta1.
But don’t worry about that, our program will do the hard task for us.

To achieve this, we will use dummy values for theta0 and theta1, put it in our hypothesis function, and
calculate the cost for that line. Repeat this step until we reach the minimum cost. How will we know what
the minimum cost is? I will come to that, but first, have a look at the function that calculates cost.

Recall our table for different prices of the house.


Don’t worry about scrolling up.

Clear about the different symbols?


Now come back to our line and the error function. To understand the cost function, we have to take help
from calculus. Consider the graph again. Let’s try to calculate the cost for each point and the line manually.

So we are subtracting each point from the line. The point on the line that is precisely below a specific point
can be found by putting the value of x in the line equation.[If you don’t know about the equation of a line,
first consider it by watching some tutorials on the internet.]

Now, sum up all the terms using the summation sigma. The limit for the values to be summed is equal to
the number of points, and each point refers to a particular training example, so our i varies from 1 to m.
Now exchange the positions of the y and hypothesis function and take square to account for the negative
values. Divide the summation by 2m to reduce the cost. This is just to make computation easy for the
computer. You can also neglect this part.
You have your error function. Remember what I said about tweaking the values of theta0 and theta1. This
is how we will calculate the cost for each value of theta0 and theta1.

Now, our main task is to predict the price of a new house using this dataset. This is achieved using Linear
Regression. What we do is fit a line into our dataset in such a way that it minimizes the distance from each
point.

One prediction would be the above blue line. Note that I have tried to draw the line in such a way that it is
close relative to all the points. So we have to choose such a line that perfectly fit our data set.
Assumptions of Linear Regression

The basic assumptions of Linear Regression are as follows:

1. Linearity: It states that the dependent variable Y should be linearly related to independent variables.
This assumption can be checked by plotting a scatter plot between both variables.

2. Normality: The X and Y variables should be normally distributed. Histograms, KDE plots, Q-Q plots
can be used to check the Normality assumption.

3. Homoscedasticity: The variance of the error terms should be constant i.e. the spread of residuals should
be constant for all values of X. This assumption can be checked by plotting a residual plot. If the
assumption is violated then the points will form a funnel shape otherwise they will be constant.

4. Independence/No Multicollinearity: The variables should be independent of each other i.e no


correlation should be there between the independent variables. To check the assumption, we can use a
correlation matrix or VIF (variance inflation factor) score. If the VIF score is greater than 5 then the
variables are highly correlated.

In VIF method, we pick each feature and regress it against all of the other features. For each regression,
the factor is calculated as :

Where, R-squared is the coefficient of determination in linear regression. Its value lies between 0 and 1.
As we see from the formula, greater the value of R-squared, greater is the VIF. Hence, greater VIF
denotes greater correlation. This is in agreement with the fact that a higher R-squared value denotes a
stronger collinearity. Generally, a VIF above 5 indicates a high multicollinearity.

R-squared is a statistical measure that represents the goodness of fit of a regression model. The ideal
value for r-square is 1. The closer the value of r-square to 1, the better is the model fitted.
R-square is a comparison of the residual sum of squares (SSres) with the total sum of squares(SStot). The
total sum of squares is calculated by summation of squares of perpendicular distance between data
points and the average line.

The residual sum of squares is calculated by the summation of squares of perpendicular distance
between data points and the best-fitted line.

R square is calculated by using the following formula :

Where SSres is the residual sum of squares and SStot is the total sum of squares.
The goodness of fit of regression models can be analyzed on the basis of the R-square method. The
more the value of r-square near 1, the better is the model.
Note: The value of R-square can also be negative when the model fitted is worse than the average fitted
model.
Limitation of using the R-square method –
• The value of r-square always increases or remains the same as new variables are added to the model,
without detecting the significance of this newly added variable (i.e value of r-square never decreases
on the addition of new attributes to the model). As a result, non-significant attributes can also be
added to the model with an increase in the r-square value.
• This is because SStot is always constant and the regression model tries to decrease the value
of SSres by finding some correlation with this new attribute hence the overall value of r-square
increases, which can lead to a poor regression model.
5. The error terms should be normally distributed. Q-Q plots and Histograms can be used to check the
distribution of error terms.

6. No Autocorrelation: The error terms should be independent of each other. Autocorrelation can be
tested using the Durbin Watson test. The null hypothesis assumes that there is no autocorrelation. The
value of the test lies between 0 to 4. If the value of the test is 2 then there is no autocorrelation.

How to deal with the Violation of any of the Assumption

The Violation of the assumptions leads to a decrease in the accuracy of the model therefore the predictions
are not accurate and error is also high.
For example, if the Independence assumption is violated then the relationship between the independent
and dependent variable cannot be determined precisely.

There are various methods are techniques available to deal with the violation of the assumptions. Let’s
discuss some of them below.

Violation of Normality assumption of variables or error terms

To treat this problem, we can transform the variables to the normal distribution using various
transformation functions such as log transformation, Reciprocal, or Box-Cox Transformation.
All the functions are discussed in this article of mine:

Violation of Multi-Collineraity Assumption

It can be dealt with by:

• Doing nothing (if there is no major difference in the accuracy)

• Removing some of the highly correlated independent variables.

• Deriving a new feature by linearly combining the independent variables, such as adding them together
or performing some mathematical operation.

• Performing an analysis designed for highly correlated variables, such as principal components analysis.
Types of Linear Regression

Linear regression can be further divided into two types of the algorithm:

o Simple Linear Regression:


If a single independent variable is used to predict the value of a numerical dependent variable,
then such a Linear Regression algorithm is called Simple Linear Regression.
o Multiple Linear regression:
If more than one independent variable is used to predict the value of a numerical dependent
variable, then such a Linear Regression algorithm is called Multiple Linear Regression.

Linear Regression Line

A linear line showing the relationship between the dependent and independent variables is called
a regression line. A regression line can show two types of relationship:

o Positive Linear Relationship:


If the dependent variable increases on the Y-axis and independent variable increases on X-axis,
then such a relationship is termed as a Positive linear relationship.

o Negative Linear Relationship:


If the dependent variable decreases on the Y-axis and independent variable increases on the X-
axis, then such a relationship is called a negative linear relationship.
While training and building a regression model, it is these coefficients which are learned and fitted to
training data. The aim of the training is to find the best fit line such that cost function is minimized. The
cost function helps in measuring the error. During the training process, we try to minimize the error
between actual and predicted values and thus minimizing the cost function.

In the figure, the red points are the data points and the blue line is the predicted line for the training data.
To get the predicted value, these data points are projected on to the line.

To summarize, our aim is to find such values of coefficients which will minimize the cost function. The
most common cost function is Mean Squared Error (MSE) which is equal to the average squared
difference between an observation’s actual and predicted values. The coefficient values can be calculated
using the Gradient Descent . To give a brief understanding, in Gradient descent we start with some
random values of coefficients, compute the gradient of cost function on these values, update the
coefficients and calculate the cost function again. This process is repeated until we find a minimum value
of cost function.

Finding the best fit line:

When working with linear regression, our main goal is to find the best fit line that means the error
between predicted values and actual values should be minimized. The best fit line will have the least
error.

The different values for weights or the coefficient of lines (a0, a1) gives a different line of regression, so
we need to calculate the best values for a0 and a1 to find the best fit line, so to calculate this we use cost
function.

Cost function-
o The different values for weights or coefficient of lines (a0, a1) gives the different line of
regression, and the cost function is used to estimate the values of the coefficient for the best fit
line.
o Cost function optimizes the regression coefficients or weights. It measures how a linear
regression model is performing.
o We can use the cost function to find the accuracy of the mapping function, which maps the input
variable to the output variable. This mapping function is also known as Hypothesis function.

For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is the average of
squared error occurred between the predicted values and actual values. It can be written as:

For the above linear equation, MSE can be calculated as:

Where,

N=Total number of observation


Yi = Actual value
(a1xi+a0)= Predicted value.

Residuals: The distance between the actual value and predicted values is called residual. If the observed
points are far from the regression line, then the residual will be high, and so cost function will high. If the
scatter points are close to the regression line, then the residual will be small and hence the cost function.

Gradient Descent:
o Gradient descent is used to minimize the MSE by calculating the gradient of the cost function.
o A regression model uses gradient descent to update the coefficients of the line by reducing the
cost function.
o It is done by a random selection of values of coefficient and then iteratively update the values to
reach the minimum cost function.

Model Performance:

The Goodness of fit determines how the line of regression fits the set of observations. The process of
finding the best model out of various models is called optimization. It can be achieved by below method:

1. R-squared method:

o R-squared is a statistical method that determines the goodness of fit.


o It measures the strength of the relationship between the dependent and independent variables on a
scale of 0-100%.
o The high value of R-square determines the less difference between the predicted values and actual
values and hence represents a good model.
o It is also called a coefficient of determination, or coefficient of multiple determination for
multiple regression.
o It can be calculated from the below formula:
ML Polynomial Regression

o Polynomial Regression is a regression algorithm that models the relationship between a


dependent(y) and independent variable(x) as nth degree polynomial. The Polynomial Regression
equation is given below:

y= b0+b1x1+ b2x12+ b2x13+...... bnx1n


o It is also called the special case of Multiple Linear Regression in ML. Because we add some
polynomial terms to the Multiple Linear regression equation to convert it into Polynomial
Regression.
o It is a linear model with some modification in order to increase the accuracy.
o The dataset used in Polynomial regression for training is of non-linear nature.
o It makes use of a linear regression model to fit the complicated and non-linear functions and
datasets.
o Hence, "In Polynomial regression, the original features are converted into Polynomial
features of required degree (2,3,..,n) and then modeled using a linear model."

Need for Polynomial Regression:

The need of Polynomial Regression in ML can be understood in the below points:

o If we apply a linear model on a linear dataset, then it provides us a good result as we have seen
in Simple Linear Regression, but if we apply the same model without any modification on a non-
linear dataset, then it will produce a drastic output. Due to which loss function will increase, the
error rate will be high, and accuracy will be decreased.
o So for such cases, where data points are arranged in a non-linear fashion, we need the
Polynomial Regression model. We can understand it in a better way using the below comparison
diagram of the linear dataset and non-linear dataset.

o In the above image, we have taken a dataset which is arranged non-linearly. So if we try to cover
it with a linear model, then we can clearly see that it hardly covers any data point. On the other
hand, a curve is suitable to cover most of the data points, which is of the Polynomial model.
o Hence, if the datasets are arranged in a non-linear fashion, then we should use the Polynomial
Regression model instead of Simple Linear Regression.
Note: A Polynomial Regression algorithm is also called Polynomial Linear Regression because it does
not depend on the variables, instead, it depends on the coefficients, which are arranged in a linear
fashion.

Equation of the Polynomial Regression Model:

Simple Linear Regression equation: y = b0+b1x .........(a)

Multiple Linear Regression equation: y= b0+b1x+ b2x2+ b3x3+....+ bnxn .........(b)

Polynomial Regression equation: y= b0+b1x + b2x2+ b3x3+....+ bnxn ..........(c)

When we compare the above three equations, we can clearly see that all three equations are Polynomial
equations but differ by the degree of variables. The Simple and Multiple Linear equations are also
Polynomial equations with a single degree, and the Polynomial regression equation is Linear equation
with the nth degree. So if we add a degree to our linear equations, then it will be converted into
Polynomial Linear equations.

Note: To better understand Polynomial Regression, you must have knowledge of Simple Linear
Regression.

Implementation of Polynomial Regression using Python:

Here we will implement the Polynomial Regression using Python. We will understand it by comparing
Polynomial Regression model with the Simple Linear Regression model. So first, let's understand the
problem for which we are going to build the model.

Problem Description: There is a Human Resource company, which is going to hire a new candidate. The
candidate has told his previous salary 160K per annum, and the HR have to check whether he is telling
the truth or bluff. So to identify this, they only have a dataset of his previous company in which the
salaries of the top 10 positions are mentioned with their levels. By checking the dataset available, we have
found that there is a non-linear relationship between the Position levels and the salaries. Our goal is to
build a Bluffing detector regression model, so HR can hire an honest candidate. Below are the steps to
build such a model.
Steps for Polynomial Regression:

The main steps involved in Polynomial Regression are given below:

o Data Pre-processing
o Build a Linear Regression model and fit it to the dataset
o Build a Polynomial Regression model and fit it to the dataset
o Visualize the result for Linear Regression and Polynomial Regression model.
o Predicting the output.

Note: Here, we will build the Linear regression model as well as Polynomial Regression to see the results
between the predictions. And Linear regression model is for reference.

Data Pre-processing Step:

The data pre-processing step will remain the same as in previous regression models, except for some
changes. In the Polynomial Regression model, we will not use feature scaling, and also we will not split
our dataset into training and test set. It has two reasons:

o The dataset contains very less information which is not suitable to divide it into a test and training
set, else our model will not be able to find the correlations between the salaries and levels.
o In this model, we want very accurate predictions for salary, so the model should have enough
information.

The code for pre-processing step is given below:

1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. #importing datasets
7. data_set= pd.read_csv('Position_Salaries.csv')
8.
9. #Extracting Independent and dependent Variable
10. x= data_set.iloc[:, 1:2].values
11. y= data_set.iloc[:, 2].values

Explanation:

o In the above lines of code, we have imported the important Python libraries to import dataset and
operate on it.
o Next, we have imported the dataset 'Position_Salaries.csv', which contains three columns
(Position, Levels, and Salary), but we will consider only two columns (Salary and Levels).
o After that, we have extracted the dependent(Y) and independent variable(X) from the dataset. For
x-variable, we have taken parameters as [:,1:2], because we want 1 index(levels), and included :2
to make it as a matrix.

Output:

By executing the above code, we can read our dataset as:

As we can see in the above output, there are three columns present (Positions, Levels, and Salaries). But
we are only considering two columns because Positions are equivalent to the levels or may be seen as the
encoded form of Positions.

Here we will predict the output for level 6.5 because the candidate has 4+ years' experience as a regional
manager, so he must be somewhere between levels 7 and 6.

Building the Linear regression model:

Now, we will build and fit the Linear regression model to the dataset. In building polynomial regression,
we will take the Linear regression model as reference and compare both the results. The code is given
below:

1. #Fitting the Linear Regression to the dataset


2. from sklearn.linear_model import LinearRegression
3. lin_regs= LinearRegression()
4. lin_regs.fit(x,y)

In the above code, we have created the Simple Linear model using lin_regs object
of LinearRegression class and fitted it to the dataset variables (x and y).
Output:

Out[5]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Building the Polynomial regression model:

Now we will build the Polynomial Regression model, but it will be a little different from the Simple
Linear model. Because here we will use PolynomialFeatures class of preprocessing library. We are
using this class to add some extra features to our dataset.

1. #Fitting the Polynomial regression to the dataset


2. from sklearn.preprocessing import PolynomialFeatures
3. poly_regs= PolynomialFeatures(degree= 2)
4. x_poly= poly_regs.fit_transform(x)
5. lin_reg_2 =LinearRegression()
6. lin_reg_2.fit(x_poly, y)

In the above lines of code, we have used poly_regs.fit_transform(x), because first we are converting our
feature matrix into polynomial feature matrix, and then fitting it to the Polynomial regression model. The
parameter value(degree= 2) depends on our choice. We can choose it according to our Polynomial
features.

After executing the code, we will get another matrix x_poly, which can be seen under the variable
explorer option:
Next, we have used another LinearRegression object, namely lin_reg_2, to fit our x_poly vector to the
linear model.

Output:

Out[11]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Visualizing the result for Linear regression:

Now we will visualize the result for Linear regression model as we did in Simple Linear Regression.
Below is the code for it:

1. #Visulaizing the result for Linear Regression model


2. mtp.scatter(x,y,color="blue")
3. mtp.plot(x,lin_regs.predict(x), color="red")
4. mtp.title("Bluff detection model(Linear Regression)")
5. mtp.xlabel("Position Levels")
6. mtp.ylabel("Salary")
7. mtp.show()

Output:

In the above output image, we can clearly see that the regression line is so far from the datasets.
Predictions are in a red straight line, and blue points are actual values. If we consider this output to predict
the value of CEO, it will give a salary of approx. 600000$, which is far away from the real value.

So we need a curved model to fit the dataset other than a straight line.

Visualizing the result for Polynomial Regression

Here we will visualize the result of Polynomial regression model, code for which is little different from
the above model.

Code for this is given below:

1. #Visulaizing the result for Polynomial Regression


2. mtp.scatter(x,y,color="blue")
3. mtp.plot(x, lin_reg_2.predict(poly_regs.fit_transform(x)), color="red")
4. mtp.title("Bluff detection model(Polynomial Regression)")
5. mtp.xlabel("Position Levels")
6. mtp.ylabel("Salary")
7. mtp.show()

In the above code, we have taken lin_reg_2.predict(poly_regs.fit_transform(x), instead of x_poly, because


we want a Linear regressor object to predict the polynomial features matrix.

Output:

As we can see in the above output image, the predictions are close to the real values. The above plot will
vary as we will change the degree.

For degree= 3:

If we change the degree=3, then we will give a more accurate plot, as shown in the below image.

SO as we can see here in the above output image, the predicted salary for level 6.5 is near to 170K$-
190k$, which seems that future employee is saying the truth about his salary.
Degree= 4: Let's again change the degree to 4, and now will get the most accurate plot. Hence we can get
more accurate results by increasing the degree of Polynomial.

Predicting the final result with the Linear Regression model:

Now, we will predict the final output using the Linear regression model to see whether an employee is
saying truth or bluff. So, for this, we will use the predict() method and will pass the value 6.5. Below is
the code for it:

1. lin_pred = lin_regs.predict([[6.5]])
2. print(lin_pred)

Output:

[330378.78787879]

Predicting the final result with the Polynomial Regression model:

Now, we will predict the final output using the Polynomial Regression model to compare with Linear
model. Below is the code for it:

1. poly_pred = lin_reg_2.predict(poly_regs.fit_transform([[6.5]]))
2. print(poly_pred)

Output:

[158862.45265153]

As we can see, the predicted output for the Polynomial Regression is [158862.45265153], which is much
closer to real value hence, we can say that future employee is saying true.
Logistic Regression:
o Logistic regression is another supervised learning algorithm which is used to solve the
classification problems. In classification problems, we have dependent variables in a binary or
discrete format such as 0 or 1.
o Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or No, True
or False, Spam or not spam, etc.
o It is a predictive analysis algorithm which works on the concept of probability.
o Logistic regression is a type of regression, but it is different from the linear regression algorithm
in the term how they are used.
o Logistic regression uses sigmoid function or logistic function which is a complex cost function.
This sigmoid function is used to model the data in logistic regression. The function can be
represented as:

o f(x)= Output between the 0 and 1 value.


o x= input to the function
o e= base of natural logarithm.

When we provide the input values (data) to the function, it gives the S-curve as follows:

o It uses the concept of threshold levels, values above the threshold level are rounded up to 1, and
values below the threshold level are rounded up to 0.

There are three types of logistic regression:

o Binary(0/1, pass/fail)
o Multi(cats, dogs, lions)
o Ordinal(low, medium, high)

Polynomial Regression:
o Polynomial Regression is a type of regression which models the non-linear dataset using a linear
model.
o It is similar to multiple linear regression, but it fits a non-linear curve between the value of x and
corresponding conditional values of y.
o Suppose there is a dataset which consists of datapoints which are present in a non-linear fashion,
so for such case, linear regression will not best fit to those datapoints. To cover such datapoints,
we need Polynomial regression.
o In Polynomial regression, the original features are transformed into polynomial features of
given degree and then modeled using a linear model. Which means the datapoints are best
fitted using a polynomial line.

o The equation for polynomial regression also derived from linear regression equation that means
Linear regression equation Y= b0+ b1x, is transformed into Polynomial regression equation Y=
b0+b1x+ b2x2+ b3x3+.....+ bnxn.
o Here Y is the predicted/target output, b0, b1,... bn are the regression coefficients. x is
our independent/input variable.
o The model is still linear as the coefficients are still linear with quadratic

Note: This is different from Multiple Linear regression in such a way that in Polynomial regression, a
single element has different degrees instead of multiple variables with the same degree.

Support Vector Regression:

Support Vector Machine is a supervised learning algorithm which can be used for regression as well as
classification problems. So if we use it for regression problems, then it is termed as Support Vector
Regression.

Support Vector Regression is a regression algorithm which works for continuous variables. Below are
some keywords which are used in Support Vector Regression:

o Kernel: It is a function used to map a lower-dimensional data into higher dimensional data.
o Hyperplane: In general SVM, it is a separation line between two classes, but in SVR, it is a line
which helps to predict the continuous variables and cover most of the datapoints.
o Boundary line: Boundary lines are the two lines apart from hyperplane, which creates a margin
for datapoints.
o Support vectors: Support vectors are the datapoints which are nearest to the hyperplane and
opposite class.
In SVR, we always try to determine a hyperplane with a maximum margin, so that maximum number of
datapoints are covered in that margin. The main goal of SVR is to consider the maximum datapoints
within the boundary lines and the hyperplane (best-fit line) must contain a maximum number of
datapoints. Consider the below image:

Here, the blue line is called hyperplane, and the other two lines are known as boundary lines.

Decision Tree Regression:


o Decision Tree is a supervised learning algorithm which can be used for solving both classification
and regression problems.
o It can solve problems for both categorical and numerical data
o Decision Tree regression builds a tree-like structure in which each internal node represents the
"test" for an attribute, each branch represent the result of the test, and each leaf node represents
the final decision or result.
o A decision tree is constructed starting from the root node/parent node (dataset), which splits into
left and right child nodes (subsets of dataset). These child nodes are further divided into their
children node, and themselves become the parent node of those nodes. Consider the below image:
Above image showing the example of Decision Tee regression, here, the model is trying to predict the
choice of a person between Sports cars or Luxury car.

o Random forest is one of the most powerful supervised learning algorithms which is capable of
performing regression as well as classification tasks.
o The Random Forest regression is an ensemble learning method which combines multiple decision
trees and predicts the final output based on the average of each tree output. The combined
decision trees are called as base models, and it can be represented more formally as:

g(x)= f0(x)+ f1(x)+ f2(x)+....


o Random forest uses Bagging or Bootstrap Aggregation technique of ensemble learning in
which aggregated decision tree runs in parallel and do not interact with each other.
o With the help of Random Forest regression, we can prevent Overfitting in the model by creating
random subsets of the dataset.
Ridge Regression:
o Ridge regression is one of the most robust versions of linear regression in which a small amount
of bias is introduced so that we can get better long term predictions.
o The amount of bias added to the model is known as Ridge Regression penalty. We can compute
this penalty term by multiplying with the lambda to the squared weight of each individual
features.
o The equation for ridge regression will be:

o A general linear or polynomial regression will fail if there is high collinearity between the
independent variables, so to solve such problems, Ridge regression can be used.
o Ridge regression is a regularization technique, which is used to reduce the complexity of the
model. It is also called as L2 regularization.
o It helps to solve the problems if we have more parameters than samples.

Lasso Regression:
o Lasso regression is another regularization technique to reduce the complexity of the model.
o It is similar to the Ridge Regression except that penalty term contains only the absolute weights
instead of a square of weights.
o Since it takes absolute values, hence, it can shrink the slope to 0, whereas Ridge Regression can
only shrink it near to 0.
o It is also called as L1 regularization. The equation for Lasso regression will be:
Logistic Regression in Machine Learning

o Logistic regression is one of the most popular Machine Learning algorithms, which comes under
the Supervised Learning technique. It is used for predicting the categorical dependent variable
using a given set of independent variables.
o Logistic regression predicts the output of a categorical dependent variable. Therefore the outcome
must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc. but
instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie between
0 and 1.
o Logistic Regression is much similar to the Linear Regression except that how they are used.
Linear Regression is used for solving Regression problems, whereas Logistic regression is used
for solving the classification problems.
o In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic function,
which predicts two maximum values (0 or 1).
o The curve from the logistic function indicates the likelihood of something such as whether the
cells are cancerous or not, a mouse is obese or not based on its weight, etc.
o Logistic Regression is a significant machine learning algorithm because it has the ability to
provide probabilities and classify new data using continuous and discrete datasets.
o Logistic Regression can be used to classify the observations using different types of data and can
easily determine the most effective variables used for the classification. The below image is
showing the logistic function:

Note: Logistic regression uses the concept of predictive modeling as regression; therefore, it is called
logistic regression, but is used to classify samples; Therefore, it falls under the classification algorithm.

Logistic Function (Sigmoid Function):


o The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
o It maps any real value into another value within a range of 0 and 1.
o The value of the logistic regression must be between 0 and 1, which cannot go beyond this limit,
so it forms a curve like the "S" form. The S-form curve is called the Sigmoid function or the
logistic function.
o In logistic regression, we use the concept of the threshold value, which defines the probability of
either 0 or 1. Such as values above the threshold value tends to 1, and a value below the threshold
values tends to 0.
Assumptions for Logistic Regression:
o The dependent variable must be categorical in nature.
o The independent variable should not have multi-collinearity.

Logistic Regression Equation:

The Logistic regression equation can be obtained from the Linear Regression equation. The mathematical
steps to get Logistic Regression equations are given below:

o We know the equation of the straight line can be written as:

o In Logistic Regression y can be between 0 and 1 only, so for this let's divide the above equation
by (1-y):

o But we need range between -[infinity] to +[infinity], then take logarithm of the equation it will
become:

The above equation is the final equation for Logistic Regression.

Type of Logistic Regression:

On the basis of the categories, Logistic Regression can be classified into three types:

o Binomial: In binomial Logistic regression, there can be only two possible types of the dependent
variables, such as 0 or 1, Pass or Fail, etc.
o Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered
types of the dependent variable, such as "cat", "dogs", or "sheep"
o Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of
dependent variables, such as "low", "Medium", or "High".

Python Implementation of Logistic Regression (Binomial)

To understand the implementation of Logistic Regression in Python, we will use the below example:

00:00/05:19Example: There is a dataset given which contains the information of various users obtained
from the social networking sites. There is a car making company that has recently launched a new SUV
car. So the company wanted to check how many users from the dataset, wants to purchase the car.

For this problem, we will build a Machine Learning model using the Logistic regression algorithm. The
dataset is shown in the below image. In this problem, we will predict the purchased variable
(Dependent Variable) by using age and salary (Independent variables).
Steps in Logistic Regression: To implement the Logistic Regression using Python, we will use the same
steps as we have done in previous topics of Regression. Below are the steps:

o Data Pre-processing step


o Fitting Logistic Regression to the Training set
o Predicting the test result
o Test accuracy of the result(Creation of Confusion matrix)
o Visualizing the test set result.

1. Data Pre-processing step: In this step, we will pre-process/prepare the data so that we can use it in our
code efficiently. It will be the same as we have done in Data pre-processing topic. The code for this is
given below:

1. #Data Pre-procesing Step


2. # importing libraries
3. import numpy as nm
4. import matplotlib.pyplot as mtp
5. import pandas as pd
6.
7. #importing datasets
8. data_set= pd.read_csv('user_data.csv')

By executing the above lines of code, we will get the dataset as the output. Consider the given image:
Now, we will extract the dependent and independent variables from the given dataset. Below is the code
for it:

#Extracting Independent and dependent Variable

1. x= data_set.iloc[:, [2,3]].values
2. y= data_set.iloc[:, 4].values

In the above code, we have taken [2, 3] for x because our independent variables are age and salary, which
are at index 2, 3. And we have taken 4 for y variable because our dependent variable is at index 4. The
output will be:
Now we will split the dataset into a training set and test set. Below is the code for it:

1. # Splitting the dataset into training and test set.


2. from sklearn.model_selection import train_test_split
3. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0)

The output for this is given below:

For test

set:

For training set:


In logistic regression, we will do feature scaling because we want accurate result of predictions. Here we
will only scale the independent variable because dependent variable have only 0 and 1 values. Below is
the code for it:

1. #feature Scaling
2. from sklearn.preprocessing import StandardScaler
3. st_x= StandardScaler()
4. x_train= st_x.fit_transform(x_train)
5. x_test= st_x.transform(x_test)

The scaled output is given below:


2. Fitting Logistic Regression to the Training set:

We have well prepared our dataset, and now we will train the dataset using the training set. For providing
training or fitting the model to the training set, we will import the LogisticRegression class of
the sklearn library.

After importing the class, we will create a classifier object and use it to fit the model to the logistic
regression. Below is the code for it:

1. #Fitting Logistic Regression to the training set


2. from sklearn.linear_model import LogisticRegression
3. classifier= LogisticRegression(random_state=0)
4. classifier.fit(x_train, y_train)

Output: By executing the above code, we will get the below output:

Out[5]:

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,


intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=0, solver='warn', tol=0.0001, verbose=0,
warm_start=False)

Hence our model is well fitted to the training set.

3. Predicting the Test Result

Our model is well trained on the training set, so we will now predict the result by using test set data.
Below is the code for it:

1. #Predicting the test set result


2. y_pred= classifier.predict(x_test)

In the above code, we have created a y_pred vector to predict the test set result.

Output: By executing the above code, a new vector (y_pred) will be created under the variable explorer
option. It can be seen as:
The above output image shows the corresponding predicted users who want to purchase or not purchase
the car.

4. Test Accuracy of the result

Now we will create the confusion matrix here to check the accuracy of the classification. To create it, we
need to import the confusion_matrix function of the sklearn library. After importing the function, we
will call it using a new variable cm. The function takes two parameters, mainly y_true( the actual values)
and y_pred (the targeted value return by the classifier). Below is the code for it:

1. #Creating the Confusion matrix


2. from sklearn.metrics import confusion_matrix
3. cm= confusion_matrix()

Output:

By executing the above code, a new confusion matrix will be created. Consider the below image:
We can find the accuracy of the predicted result by interpreting the confusion matrix. By above output,
we can interpret that 65+24= 89 (Correct Output) and 8+3= 11(Incorrect Output).

5. Visualizing the training set result

Finally, we will visualize the training set result. To visualize the result, we will
use ListedColormap class of matplotlib library. Below is the code for it:

#Visualizing the training set result

1. from matplotlib.colors import ListedColormap


2. x_set, y_set = x_train, y_train
3. x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step =0.01),
4. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
5. mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape),
6. alpha = 0.75, cmap = ListedColormap(('purple','green' )))
7. mtp.xlim(x1.min(), x1.max())
8. mtp.ylim(x2.min(), x2.max())
9. for i, j in enumerate(nm.unique(y_set)):
10. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
11. c = ListedColormap(('purple', 'green'))(i), label = j)
12. mtp.title('Logistic Regression (Training set)')
13. mtp.xlabel('Age')
14. mtp.ylabel('Estimated Salary')
15. mtp.legend()
16. mtp.show()

In the above code, we have imported the ListedColormap class of Matplotlib library to create the
colormap for visualizing the result. We have created two new variables x_set and y_set to
replace x_train and y_train. After that, we have used the nm.meshgrid command to create a rectangular
grid, which has a range of -1(minimum) to 1 (maximum). The pixel points we have taken are of 0.01
resolution.
To create a filled contour, we have used mtp.contourf command, it will create regions of provided colors
(purple and green). In this function, we have passed the classifier.predict to show the predicted data
points predicted by the classifier.

Output: By executing the above code, we will get the below output:

The graph can be explained in the below points:

o In the above graph, we can see that there are some Green points within the green region
and Purple points within the purple region.
o All these data points are the observation points from the training set, which shows the result for
purchased variables.
o This graph is made by using two independent variables i.e., Age on the x-axis and Estimated
salary on the y-axis.
o The purple point observations are for which purchased (dependent variable) is probably 0, i.e.,
users who did not purchase the SUV car.
o The green point observations are for which purchased (dependent variable) is probably 1 means
user who purchased the SUV car.
o We can also estimate from the graph that the users who are younger with low salary, did not
purchase the car, whereas older users with high estimated salary purchased the car.
o But there are some purple points in the green region (Buying the car) and some green points in
the purple region(Not buying the car). So we can say that younger users with a high estimated
salary purchased the car, whereas an older user with a low estimated salary did not purchase the
car.

The goal of the classifier:

We have successfully visualized the training set result for the logistic regression, and our goal for this
classification is to divide the users who purchased the SUV car and who did not purchase the car. So from
the output graph, we can clearly see the two regions (Purple and Green) with the observation points. The
Purple region is for those users who didn't buy the car, and Green Region is for those users who
purchased the car.

Linear Classifier:

As we can see from the graph, the classifier is a Straight line or linear in nature as we have used the
Linear model for Logistic Regression. In further topics, we will learn for non-linear Classifiers.
Visualizing the test set result:

Our model is well trained using the training dataset. Now, we will visualize the result for new
observations (Test set). The code for the test set will remain same as above except that here we will
use x_test and y_test instead of x_train and y_train. Below is the code for it:

1. #Visulaizing the test set result


2. from matplotlib.colors import ListedColormap
3. x_set, y_set = x_test, y_test
4. x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step =0.01),
5. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
6. mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape),
7. alpha = 0.75, cmap = ListedColormap(('purple','green' )))
8. mtp.xlim(x1.min(), x1.max())
9. mtp.ylim(x2.min(), x2.max())
10. for i, j in enumerate(nm.unique(y_set)):
11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
12. c = ListedColormap(('purple', 'green'))(i), label = j)
13. mtp.title('Logistic Regression (Test set)')
14. mtp.xlabel('Age')
15. mtp.ylabel('Estimated Salary')
16. mtp.legend()
17. mtp.show()

Output:

The above graph shows the test set result. As we


can see, the graph is divided into two regions (Purple and Green). And Green observations are in the
green region, and Purple observations are in the purple region. So we can say it is a good prediction and
model. Some of the green and purple data points are in different regions, which can be ignored as we have
already calculated this error using the confusion matrix (11 Incorrect output).

Hence our model is pretty good and ready to make new predictions for this classification problem.

What is Polynomial Regression?


In polynomial regression, the relationship between the independent variable x and the dependent variable
y is described as an nth degree polynomial in x. Polynomial regression, abbreviated E(y |x), describes the
fitting of a nonlinear relationship between the value of x and the conditional mean of y. It usually
corresponded to the least-squares method. According to the Gauss Markov Theorem, the least square
approach minimizes the variance of the coefficients. This is a type of Linear Regression in which the
dependent and independent variables have a curvilinear relationship and the polynomial equation is fitted
to the data

Types of Polynomial Regression


A quadratic equation is a general term for a second-degree polynomial equation. This degree, on the other
hand, can go up to nth values. Polynomial regression can so be categorized as follows:

1. Linear – if degree as 1

2. Quadratic – if degree as 2

3. Cubic – if degree as 3 and goes on, on the basis of degree.

Assumption of Polynomial Regression


We cannot process all of the datasets and use polynomial regression machine learning to make a better
judgment. We can still do it, but there should be specific constraints for the dataset in order to get the best
polynomial regression results.
A dependent variable’s behaviour can be described by a linear, or curved, an additive link between the
dependent variable and a set of k independent factors.
The independent variables have no relationship with one another.
We’re utilizing datasets with independent errors that are normally distributed with a mean of zero and a
constant variance.

Simple math to understand Polynomial Regression

Here we are dealing with mathematics, rather than going deep, just understand the basic structure, we all
know the equation of a linear equation will be a straight line, from that if we have many features then we
opt for multiple regression just increasing features part alone, then how about polynomial, it’s not about
increasing but changing the structure to a quadratic equation, you can visually understand from the
diagram,
Linear Regression Vs Polynomial Regression

Rather than focusing on the distinctions between linear and polynomial regression, we may comprehend
the importance of polynomial regression by starting with linear regression. We build our model and
realize that it performs abysmally. We examine the difference between the actual value and the best fit
line we predicted, and it appears that the true value has a curve on the graph, but our line is nowhere near
cutting the mean of the points. This is where polynomial regression comes into play; it predicts the best-
fit line that matches the pattern of the data (curve).
One important distinction between Linear and Polynomial Regression is that Polynomial Regression does
not require a linear relationship between the independent and dependent variables in the data set. When
the Linear Regression Model fails to capture the points in the data and the Linear Regression fails to
adequately represent the optimum conclusion, Polynomial Regression is used.
Overfitting Vs Under-fitting

We keep on increasing the degree, we will see the best result, but there comes the over-fitting problem, if
we get r2 value for a particular value shows 100.

When analyzing a dataset linearly, we encounter an under-fitting problem, which can be corrected using
polynomial regression. However, when fine-tuning the degree parameter to the optimal value, we
encounter an over-fitting problem, resulting in a 100 per cent r2 value. The conclusion is that we must
avoid both overfitting and underfitting issues.

You might also like