Gradient Descent

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Gradient descent is an optimization algorithm used to minimize

some function by iteratively moving in the direction of steepest


descent as defined by the negative of the gradient. In machine
learning, we use gradient descent to update the parameters of our
model.

When there are more than one inputs you can use a process of
optimizing values of coefficients by iteratively minimizing error of
model on your training data. This is called Gradient Descent and
works by starting with random values for each coefficient. The sum
of squared errors are calculated for each pair of input and output
variables.

A learning rate is used for each pair of input and output values. It is
a scalar factor and coefficients are updated in direction towards
minimizing error. The process is repeated until a minimum sum
squared error is achieved or no further improvement is possible.

When using this method, learning rate alpha determines the size of
improvement step to take on each iteration of procedure. In practise,
Gradient Descent is useful when there is a large dataset either in
number of rows or number of columns.

In short ,it is a minimization algorithm meant for minimizing a


given activation function.
The Gradient Descent algorithm determines the values of m and c,
such that line corresponding to those values is best fitting line /
gives minimum error.

First we need to calculate the loss function. The loss function is


defined as the error in our predicted value of m and c. Our aim is to
minimize this error to obtain most accurate value of m and c. To
calculate the loss, Mean Squared Error function is used.

· The difference between actual and predicted y value if found.

· The difference is squared.

· Then, the mean of squares for every value in x is calculated.

Now, we have to minimize “m” and “c”. To minimize these


parameters, Gradient Descent Algorithm is used.

1. Initially, let m = 0, c = 0
Where L = learning rate — controlling how much the value of “m”
changes with each step. The smaller the L, greater the accuracy. L =
0.001 for a good accuracy.

2. Calculating the partial derivative of loss function wrt “m” and give
current values of x, y , m and c to get the derivative D.

3. Similarly, D wrt c

4. Now, updating the current value of m and c,

5. We repeat this process until loss function is very small ,i.e. ideally
0 % error (100% accuracy).

When the learning rate is quite high, the process will be a very
haphazard one. So, the smaller the learning rate, the better the
result of the model will be.

After repeating this process for many times, the person arrives at the
bottom of the valley. With optimum value of m and c, model is
ready.

You might also like