Nptel Notes 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Regression

 Regression is a supervised learning that build functional relationship


between dependent and independent variables

Property Rent Price Stock Price

Data Science for Engineers- TA Session 2


Regression Example
 House rent price prediction

Model Rent is 10000 INR

• Location
• Size and area
• Transportation
• Furnished Independent Features Dependent Features
• Utilities
• Pet allowed/not
allowed

Data Science for Engineers- TA Session 3


Regression Types
Univariate Vs Multivariate
• Univariate: One dependent and one independent variable
• Multivariate: Multiple independent and multiple dependent variables

Square House Price Square Bedrooms


Footage (X) (Y) Footage (X1) (X2) House Price (Y)
1500 $250,000 1500 3 $250,000
1800 $280,000 1800 4 $280,000
1200 $220,000 1200 2 $220,000
2000 $320,000 2000 4 $320,000
1350 $240,000 1350 3 $240,000
Univariate Multivariate
Data Science for Engineers- TA Session 4
Regression Types
Linear Vs Non-linear
• Linear: Relationship is linear between dependent and independent
variables
• Non-linear: Relationship is nonlinear between dependent and
independent variables

Data Science for Engineers- TA Session 5


Regression Methods
 Linear
• Ordinary Least Squares (OLS) Regression
• Ridge Regression (L2 Regularization)
• Lasso Regression (L1 Regularization)
• Partial Least Square (PLS) Regression
• Principle Component Analysis (PCA)

 Non-linear
• Polynomial Regression
• Neural Network
• Spline Regression
Data Science for Engineers- TA Session 6
Regression Process

Data Science for Engineers- TA Session 7


Regression Illustration
We have the dataset of car service
center

It contains number of cars


(independent variable) and Minute for

Minutes
service (dependent variable)

We want to find the best functional


relationship between both variables
which can be given by linear line
Cars

Data Science for Engineers- TA Session 8


Ordinary Least square (OLS) 𝒚𝒊 = 𝜷 𝟎 + 𝜷 𝟏 𝒙 𝒊
Linear model between 𝒚𝒊 and 𝒙𝒊 , 𝒊 = 𝟏, … , 𝒏

Error in only dependent variable and no error in 𝝐𝒏


independent variable

Minutes
𝝐𝟏 𝝐𝒊 = 𝒚𝒊 − 𝒚𝒊

The sum of square of errors (SSE)

The minimization of SSE gives estimate of B0 and B1


Cars

Data Science for Engineers- TA Session 9


Testing goodness of fit
𝑅 is one of the measure use to test determine goodness of fit
R calculates the variability in output variable calculated by input variable

The value of R lie between 1(good fit) and 0 (bad fit)


Adjusted R is the modification of R metric to take into account the
number of independent variables

Data Science for Engineers- TA Session 10


In a linear regression equation, what does the slope (coefficient) represent?
a) The intercept of the regression line
b) The change in the dependent variable for a unit change in the independent variable
c) The average of the dependent variable
d) The variance of the dependent variable

Q) What does the coefficient of determination


(R-squared) measure in a regression model?
a) The accuracy of the model's predictions
b) The proportion of variance explained by the model
c) The bias of the model
d) The standard erro

We have the following data for which we want to calculate the best fit
R studio

Data Science for Engineers- TA Session 14


We have the following data for which we want to calculate the best fit

Q1) What is the slope (coefficient) of the best-fitting linear regression line for this
dataset?
a) 2.1
b) 1.3
c) 1.7
d) 2.3

Q2) What is the intercept of the best-fitting linear regression line for this dataset?
a) 2.8
b) 1.8
c) 1.9
d) 2.9

What is the predicted value of Y when X = 6 using the linear regression model?
a) 10.1
b) 8.6
c) 11.3
d) 10.7

What is the (R-squared) for the linear regression model fitted to this dataset?
a) 0.55
b) 0.45
c) 0.35
d) 0.60
What is the mean squared error (MSE) for the linear regression model fitted to this
dataset?
a)3.49
b) 2.93
c) 3.98
d) 4.21

Q) If the slope of the linear regression line is 3 and the intercept is 2, what would be
the predicted Y value when X = 8?
a) 24
b) 26
c) 28
d) 30
Sum Square Quantity Definitions

• SSR (residual sum-of-squares)


• SST (total sum-of-squares)
• SSE (sum-squared error)

• SST = SSE+SSR

• R² = 1-SSE / SST

Data Science for Engineers- TA Session 18


Which of the following formulas correctly calculates SSE (Sum of Squares Error)?
A) SSE = Σ(yᵢ- ȳ)²
B) SSE = Σ(yᵢ- ŷᵢ)²
C) SSE = Σ(ŷᵢ- ȳ)²

Q) Which equation relates SST, SSE, and SSR?


A) SST = SSE + SSR
B) SST = SSE - SSR
C) SST = SSE * SSR
D) SSE = SST - SSR

If the linear regression model perfectly fits the data, what would be the value of SSE?
A) 0
B) Equal to SST
C) Equal to SSR
D) Indeterminate

Q) What happens to R² when the regression model's fit improves?


A) R² decreases
B) R² increases
C) R² remains unchanged
D) R² becomes negative
Hypothesis test on regression coefficient

Data Science for Engineers- TA Session 23


Next step to check the linear fit

Data Science for Engineers- TA Session 24


Residual plots

Data Science for Engineers- TA Session 25

You might also like