Assignment B 4 GradientDescent

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

B.E.

(COMP) Sinhgad Institute of Technology, Lonavala LP_III

Name of the Student: __________________________________ Roll No: ______


CLASS: - B. E. [COMP] Division: A , B, C Course: LP-III
Machine Learning
Assignment No. 04
GRADIENT DESCENT ALGORITHM
Marks: /10

Date of Performance: / 2024


/2023 Sign with Date:

Title : Implement Gradient Descent Algorithm to find the local minima of a function.

Objectives:
• To implement Gradient Descent Algorithm to find the local minima of a function.

Outcomes:
• Find the local minima of a function.

PEOs, POs, PSOs and COs satisfied


PEOs: I, III POs: 1, 2, 3, 4, 5 PSOs: 1, 2 COs: 1

Problem Statement:
Implement Gradient Descent Algorithm to find the local minima of a function.
For example, find the local minima of the function y=(x+3)² starting from the point x=2.

Theory:

Gradient Descent in Machine Learning

Gradient Descent is known as one of the most commonly used optimization algorithms to train
machine learning models by means of minimizing errors between actual and expected results.
Further, gradient descent is also used to train Neural Networks.

In mathematical terminology, Optimization algorithm refers to the task of


minimizing/maximizing an objective function f(x) parameterized by x. Similarly, in machine
learning, optimization is the task of minimizing the cost function parameterized by the model's
parameters. The main objective of gradient descent is to minimize the convex function using
iteration of parameter updates. Once these machine learning models are optimized, these
models can be used as powerful tools for Artificial Intelligence and various computer science
applications.

1 | Department of Computer Engineering, SIT, Lonavala


B.E. (COMP) Sinhgad Institute of Technology, Lonavala LP_III

What is Gradient Descent or Steepest Descent?

Gradient descent was initially discovered by "Augustin-Louis Cauchy" in mid of 18th


century. Gradient Descent is defined as one of the most commonly used iterative optimization
algorithms of machine learning to train the machine learning and deep learning models. It
helps in finding the local minimum of a function.

The best way to define the local minimum or local maximum of a function using gradient
descent is as follows:

o If we move towards a negative gradient or away from the gradient of the function at the
current point, it will give the local minimum of that function.
o Whenever we move towards a positive gradient or towards the gradient of the function
at the current point, we will get the local maximum of that function.

This entire procedure is known as Gradient Ascent, which is also known as steepest
descent. The main objective of using a gradient descent algorithm is to minimize the cost
function using iteration. To achieve this goal, it performs two steps iteratively:

o Calculates the first-order derivative of the function to compute the gradient or slope of
that function.
o Move away from the direction of the gradient, which means slope increased from the
current point by alpha times, where Alpha is defined as Learning Rate. It is a tuning
parameter in the optimization process which helps to decide the length of the steps.

What is Cost-function?

The cost function is defined as the measurement of difference or error between actual values
and expected values at the current position and present in the form of a single real number. It
helps to increase and improve machine learning efficiency by providing feedback to this model
so that it can minimize error and find the local or global minimum.

2 | Department of Computer Engineering, SIT, Lonavala


B.E. (COMP) Sinhgad Institute of Technology, Lonavala LP_III

How does Gradient Descent work?

Before starting the working principle of gradient descent, we should know some basic concepts
to find out the slope of a line from linear regression. The equation for simple linear regression
is given as:

1. Y=mX+c

Where 'm' represents the slope of the line, and 'c' represents the intercepts on the y-axis.

The starting point(shown in above fig.) is used to evaluate the performance as it is considered
just as an arbitrary point. At this starting point, we will derive the first derivative or slope and
then use a tangent line to calculate the steepness of this slope. Further, this slope will inform
the updates to the parameters (weights and bias).

The slope becomes steeper at the starting point or arbitrary point, but whenever new parameters
are generated, then steepness gradually reduces, and at the lowest point, it approaches the
lowest point, which is called a point of convergence.

The main objective of gradient descent is to minimize the cost function or the error between
expected and actual. To minimize the cost function, two data points are required:

o Direction & Learning Rate

These two factors are used to determine the partial derivative calculation of future iteration and
allow it to the point of convergence or local minimum or global minimum. Let's discuss
learning rate factors in brief;

Learning Rate:

It is defined as the step size taken to reach the minimum or lowest point. This is typically a
small value that is evaluated and updated based on the behavior of the cost function. If the

3 | Department of Computer Engineering, SIT, Lonavala


B.E. (COMP) Sinhgad Institute of Technology, Lonavala LP_III

learning rate is high, it results in larger steps but also leads to risks of overshooting the
minimum. At the same time, a low learning rate shows the small step sizes, which compromises
overall efficiency but gives the advantage of more precision.

Example :
Question : Find the local minima of the function y=(x+5)² starting from the point x=3

Solution : We know the answer just by looking at the graph. y = (x+5)² reaches it’s minimum
value when x = -5 (i.e when x=-5, y=0). Hence x=-5 is the local and global minima of the
function.
Now, let’s see how to obtain the same numerically using gradient descent.
Step 1 : Initialize x =3. Then, find the gradient of the function, dy/dx = 2*(x+5).
Step 2 : Move in the direction of the negative of the gradient. But wait, how much to move? For
that, we require a learning rate. Let us assume the learning rate → 0.01

4 | Department of Computer Engineering, SIT, Lonavala


B.E. (COMP) Sinhgad Institute of Technology, Lonavala LP_III

Step 3 : Let’s perform 2 iterations of gradient descent

Step 4 : We can observe that the X value is slowly decreasing and should converge to -5 (the
local minima). However, how many iterations should we perform?
Let us set a precision variable in our algorithm which calculates the difference between two
consecutive “x” values . If the difference between x values from 2 consecutive iterations is lesser
than the precision we set, stop the algorithm !

Conclusion:
Thus, we implemented Gradient Descent algorithm in Python.

A. Write short answer of following questions:


1. What is gradient descent explain with example?
2. Which ML algorithms use gradient descent?
3. How is the gradient descent useful in machine learning implementation?
4. What happens when learning rate is too high gradient descent?
5. How many types of gradient descent are there? Which gradient descent is best?

5 | Department of Computer Engineering, SIT, Lonavala

You might also like