Variance and Bias

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 14

Regularization : Bias and

Variance,L2 Reqularization
Bias and Variance

• By looking at the algorithm’s error on the training set(which we call bias) and on the
dev set(which we call variance) we can try different things to improve the algorithm
• What is bias?
• Bias is the difference between the average prediction of our model and the correct
value which we are trying to predict. Model with high bias pays very little attention to
the training data and oversimplifies the model. It always leads to high error on
training and test data.
• What is variance?
• Variance is the variability of model prediction for a given data point or a value which
tells us spread of our data. Model with high variance pays a lot of attention to training
data and does not generalize on the data which it hasn’t seen before. As a result, such
models perform very well on training data but has high error rates on test data.
How to address high Bias ?

• Bigger network ie more hidden layers or more hidden units.


• Train it longer.
• Try some more advanced optimization algorithms/better NN
architecture
How to address high variance(aka overfitting) ?

• Add more data


• Regularization
• Search for a better NN architecture
Example
Why is Bias Variance Tradeoff?

• If our model is too simple and has very few parameters then it may
have high bias and low variance. On the other hand if our model has
large number of parameters then it’s going to have high variance and
low bias. So we need to find the right/good balance without
overfitting and underfitting the data.
• This tradeoff in complexity is why there is a tradeoff between bias and
variance. An algorithm can’t be more complex and less complex at the
same time.
Total Error
To build a good model, we need to find a good balance between bias and variance
such that it minimizes the total error.
• Mathematically
• Let the variable we are trying to predict as Y and other covariates as
X. We assume there is a relationship between the two such that
Y=f(X) + e
• Where e is the error term and it’s normally distributed with a mean of
0.
• We will make a model f^(X) of f(X) using linear regression or any other
modeling technique.
• So the expected squared error at a point x is
• The Err(x) can be further decomposed as

Err(x) is the sum of Bias², variance and the irreducible error.


Irreducible error is the error that can’t be reduced by creating good models. It is a
measure of the amount of noise in our data. Here it is important to understand that no
matter how good we make our model, our data will have certain amount of noise or
irreducible error that can not be removed.
Regularization
• The word regularize means to make things regular or acceptable.
Regularizations are techniques used to reduce the error by fitting a
function appropriately on the given training set and avoid overfitting.
• Regularization is a set of techniques that can prevent overfitting in
neural networks and thus improve the accuracy of a Deep Learning
model when facing completely new data from the problem domain.
Regularization
• Regularization is a technique used to reduce the errors by fitting the
function appropriately on the given training set and avoid overfitting.
The commonly used regularisation techniques are :
• L1 regularisation
• L2 regularisation
• Dropout regularisation

You might also like