Aiml M3 C3
Aiml M3 C3
Aiml M3 C3
Regression Analysis
5.1 INTRODUCTION TO REGRESSION
• Regression analysis is the premier method of supervised
learning.
• This is one of the most popular and oldest supervised
learning technique.
• Given a training dataset D containing N training points (xi, yi),
where i = 1...N, regression analysis is used to model the
relationship between one or more independent variables xi and
a dependent variable yi.
• The relationship between the dependent and independent
variables can be represented as a function as follows:
y = f(x)
Here, the feature variable x is also known as an explanatory
variable, exploratory variable, a predictor variable, an
independent variable, a covariate, or a domain point.
y is a dependent variable. Dependent variables are also called
as labels, target variables, or response variables.
Regression:
• A regression model determines a relationship between an independent
variable and a dependent variable, by providing a function.
• Formulating a regression analysis helps you predict the effects of
the independent variable on the dependent one.
• Example: we can say that age and height can be described using
a linear regression model.
• Since a person's height increases as age increases, they have a
linear relationship.
For Understanding:
While correlation is about relationships among variables, say x and y, regression is about predicting one
variable given another variable.
Regression and Causation
• Causation is about causal relationship among variables, say x and
y.
• Causation means knowing whether x causes y to happen or vice
versa. x causes y is often denoted as x implies y.
• Correlation and Regression relationships are not same as
causation relationship.
• For example, the correlation between economical background
and marks scored does not imply that economic background
causes high marks.
• Similarly, the relationship between higher sales of cool drinks due
to a rise in temperature is not a causal relation.
• Even though high temperature is the cause of cool drinks sales, it
depends on other factors too.
Linearity and Non-linearity Relationships
• The linearity relationship between the variables means the
relationship between the dependent and independent variables
can be visualized as a straight line.
• The line of the form, y = ax + b can be fitted to the data points
that indicate the relationship between x and y.
• By linearity, it’s meant that as one variable increases, the
corresponding variable also increases in a linear manner.
• A linear relationship is shown in Figure 5.2(a).
• A non-linear relationship exists in functions such as exponential
function and power function and it is shown in Figures 5.2 (b)
and 5.2 (c).
• Here, x-axis is given by x data and y-axis is given by y data.
Types of Regression Methods
• The classification of regression methods is shown in Figure 5.3.
Linear Regression
It is a type of regression where a line is fitted upon given data for
finding the linear relationship between one independent variable and
one dependent variable to describe relationships.
It create a hypothetical line that best connects all data points.
Syntax:
• y = θx + b
• where,
• θ – It is the model weights or parameters
• b – It is known as the bias.
Multiple Regression
It is a type of regression where a line is fitted for finding the linear
relationship between two or more independent variables and one
dependent variable to describe relationships among variables. Ex: you
might review salary earnings for education, experience and proximity
to a metropolitan area.
Polynomial Regression
It is a type of non-linear regression method of describing
relationships among variables where Nth degree polynomial is used
to model the relationship between one independent variable and
one dependent variable.
Polynomial multiple regression is used to model two or more
independent variables and one dependant variable.
Logistic Regression
It is used for predicting categorical variables that involve one or
more independent variables and one dependent variable. This is
also known as a binary classifier.
Lasso and Ridge Regression Methods
Ridge regression is another machine learning analysis you might
use when there’s a strong correlation between independent
variables. This means that as one independent variable
changes, others can change with it
Lasso regression, or least absolute shrinkage and selection
operator (LASSO), uses regularization and objective functions by
prohibiting the size of the regression coefficient.
These are special variants of regression method where
regularization methods are used to limit the number and size of
coefficients of the independent variables
Multicollinearity is a statistical concept where several independent variables in a model are correlated.
It makes it hard to interpret of model and also creates an overfitting problem.
5.3 INTRODUCTION TO LINEAR REGRESSION
Intercept is the value of y when x = 0
• The computation order of this equation is shown step by step
as:
5.4 VALIDATION OF REGRESSION
METHODS
• The regression model should be evaluated using some metrics
for checking the correctness. The following metrics are used to
validate the results of regression.
REVISION