Gradient Descent
Gradient Descent
Gradient Descent
When there are more than one inputs you can use a process of
optimizing values of coefficients by iteratively minimizing error of
model on your training data. This is called Gradient Descent and
works by starting with random values for each coefficient. The sum
of squared errors are calculated for each pair of input and output
variables.
A learning rate is used for each pair of input and output values. It is
a scalar factor and coefficients are updated in direction towards
minimizing error. The process is repeated until a minimum sum
squared error is achieved or no further improvement is possible.
When using this method, learning rate alpha determines the size of
improvement step to take on each iteration of procedure. In practise,
Gradient Descent is useful when there is a large dataset either in
number of rows or number of columns.
1. Initially, let m = 0, c = 0
Where L = learning rate — controlling how much the value of “m”
changes with each step. The smaller the L, greater the accuracy. L =
0.001 for a good accuracy.
2. Calculating the partial derivative of loss function wrt “m” and give
current values of x, y , m and c to get the derivative D.
3. Similarly, D wrt c
5. We repeat this process until loss function is very small ,i.e. ideally
0 % error (100% accuracy).
When the learning rate is quite high, the process will be a very
haphazard one. So, the smaller the learning rate, the better the
result of the model will be.
After repeating this process for many times, the person arrives at the
bottom of the valley. With optimum value of m and c, model is
ready.