Regression in Data Mining
Regression in Data Mining
Regression in Data Mining
Module 3
Regression
Dependent variable
B1 = slope
= y/ x
b0 (y intercept)
Observation: y
Zero
Independent variable (x)
The function will make a prediction for each observed data point.
^
The observation is denoted by y and the prediction is denoted by y.
Simple Linear Regression
Prediction error:
Observation: y
Prediction: y^
Zero
y=^
y+
Actual = Explained + Error
Regression
Dependent variable
Mathematically,
^ 2
SSR = ( y y ) (measure of explained variation)
^
SSE = ( y y ) (measure of unexplained variation)
2
SST = SSR + SSE = ( y y ) (measure of total variation in y)
The Coefficient of Determination
2 SSR SSR
R = =
SST SSR + SSE
The value of R 2 can range between 0 and 1, and the higher its value
the more accurate the regression model is. It is often referred to as a
percentage.
Standard Error of Regression
Standard Error =
SSE
n-k
y = A+ * x +
is the per unit change in the dependent variable for each unit
change in the independent variable. Mathematically:
y
=
x
Multiple Linear Regression
1 1 2 2
y = A + X + X + + k Xk +
Example table of
Correlations
Y X1 X2
Y 1.000
X1 0.802 1.000
X2 0.848 0.578 1.000