Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
Regression
D r. JASMEET S INGH
ASSISTANT P ROFESSOR, C SED
T IET, PATIALA
Regression Evaluation Metrics
The performance of the regression model is generally measured in terms of error in prediction i.e., the
difference between the actual values and the predicted values for all the instances in the test set.
5. Adjusted R2 Score
Mean Absolute Error
The Mean Absolute Error(MAE) is the average of all absolute errors where
absolute error is the absolute value of the difference between the measured value
(predicted) and “true” value (actual).
1
𝑀𝑒𝑎𝑛 𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝐸𝑟𝑟𝑜𝑟 (𝑀𝐴𝐸) = |𝑦 − 𝑦 ∧ |
𝑛
where yi is the actual value, ∧ is the predicted value of ith input of test set and n
are the total number of test samples.
Mean Squared Error
The mean squared error (MSE) or mean squared deviation (MSD) of
an estimator (of a procedure for estimating an unobserved quantity) measures
the average of the squares of the errors—that is, the average squared difference
between the estimated values and the actual value
∧ 2
where yi is the actual value, ∧ is the predicted value of ith input of test set and n
are the total number of test samples.
Root Mean Squared Error
The root-mean-square deviation (RMSD) or root-mean-square
error (RMSE) is a frequently used measure of the differences between values
(sample or population values) predicted by a model.
It is the square root of the mean squared error.
∧ 2
where yi is the actual value, ∧ is the predicted value of ith input of test set and n
are the total number of test samples.
R2 Score/ Coefficient of Determination
It measures the proportion of the variation independent variable explained by all
the independent variables in the model.
It assumes that every independent variable in the model helps to explain
variation in the dependent variable.
It is measured as the ratio of the explained variance of the model is to the total
variance of the data.
∧
where yi is the actual value, ∧ is the predicted value of ith input of test set, is
the mean of actual values of y and n are the total number of test samples.
R2 Score
Alternately, R2 Score is measured from the unexplained variance as follows:
where SSE denote sum square error and SST denote sum square total.
The value of R2 lies in between -1 and 1. R2 is negative only when the chosen
model does not follow the trend of the data, so fits worse than the regression line.
Mathematically, it is possible when error sum-of-squares from the model is
larger than the total sum-of-squares from the horizontal line.
Significance of R2 Score
R-squared is a statistical measure of how close the data are to the fitted regression line.
• 0% indicates that the model explains none of the variability of the response data around
its mean.
•100% indicates that the model explains all the variability of the response data around its
mean.
Higher the R-squared, the better the model fits your data.
Evaluation Metrics- Numerical Example
Consider that the number lectures per day (x)
affects the number of hours spent at university per S.No x y
day (y).
The equation of the regression line is 1 2 2
∧
y =0.143+1.229x
Find 2 3 4
(i) MAE
3 4 6
(ii) MSE
(iii) RMSE 4 6 7
(iv) R2 Score
For the test set shown in Table
Evaluation Metrics- Numerical Example
∧ ∧ ∧
S.No x y y =0.143+1.229x Error=y-y Ab. Error=|y-y | Sq. Error y-mean(y) SST
∧ .
.
.
Adjusted R2 Score
It measures the proportion of variation explained by only those independent variables
that really affect the dependent variable.
It penalizes you for adding independent variable that do not affect the dependent
variable.
Every time you add a independent variable to a model, the R-squared increases, even if
the independent variable is insignificant. It never declines.
Adjusted R-squared increases only when independent variable is significant and affects
dependent variable.
Adjusted R2 Score
Adjusted R2 is computed as follows:
Adjusted R2 score must be used to compare different regression models with different
number of predictors and in case we want to decide the important predictors in our
training set.