Lecture 1 Gradients and More

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 35

CS 4301

Numerical Methods in Machine Learning


and Data Science
Nicholas Ruozzi
University of Texas at Dallas
Course Info.
• Instructor: Nicholas Ruozzi
• Office: ECSS 3.409 (online this semester)

• Office hours: M 11:30-12:30, W 12:30pm-1:30pm

• TA: TBD
• Office hours and location: TBD

• Course website: www.utdallas.edu/~nicholas.ruozzi/cs4301/2020fa/


• Book: none required
• Piazza (online forum): sign-up link on eLearning

2
Prerequisites

• CS3345, Data Structures and Algorithms


• “Mathematical sophistication”
• Linear algebra: eigenvalues/vectors, matrices, vectors,
etc.
• Multivariate calculus: derivatives, gradients, etc.
• Take prerequisite “quiz” on eLearning (coming soon)

3
Grading
• 5-6 problem sets (70%)
• See collaboration policy on the web

• Mix of theory and programming (in Python)

• Available and turned in on eLearning

• Approximately one assignment every two weeks


• 2 Exams (30%)

• Attendance policy?

-subject to change-
4
What this course is about...
• Problem solving using techniques from continuous optimization
• Growth of AI, ML, and Data Science means that you are likely
to encounter these problems in the real-world
• Understand how continuous optimization techniques work and
how to interpret their outputs
• Emphasis is on understanding how to formulate problems and
apply existing tools to (approximately) solve them
• We’ll see a bit of the theory behind these methods too, but
the emphasis will be on applying them

5
Course Topics
Part 1: Numerical Optimization
• Unconstrained optimization (partial derivatives, gradients, local versus global
optimality)
• Convex functions and gradient descent
• Linear programming
• Constrained convex optimization problems
• Method of Lagrange multipliers
• Projected gradient methods
• Frank-Wolfe/Method of conditional gradients

Part 2: Numerical Methods in Linear Algebra


• Linear algebra review (matrices, vectors, dot products, inverses, rank)
• Least squares regression (via gradient descent)
• Matrix and vector norms
• Eigenvalues and eigenvectors (Courant-Fisher Theorem and power iteration)
• Singular value decomposition (SVD as a convex optimization problem)
• Nonnegative matrix factorization (CUR 6decomposition, convex combinations)
Today: Calculus Refresher

7
Univariate Calculus Review

• Given a differentiable function , how do you find its global


minimum?

8
Univariate Calculus Review

• Given a differentiable function , how do you find its global


minimum?

9
Univariate Calculus Review

• Given a differentiable function , how do you find its global


minimum?
• The global minimum can only occur in one of three places:
• An such that
• In the
• In the

10
Univariate Calculus Review

• Given a differentiable function , how do you find its global


minimum?
• The global minimum can only occur in one of three places:
• An such that
• In the
• In the
• Note that in the last two cases, the global minimum need not
be achievable at any finite

11
Simple Derivatives
• To find the places where the derivative is equal to zero, we can
employ the standard derivative rules from calculus:
• for any

12
Min vs. Inf
• The global minimum may also not exist!
• For example, consider the function

13
Min vs. Inf
• The global minimum may also not exist!
• For example, consider the function
• This function has no finite minimum
• In this case, it doesn’t make sense to write
• The infimum of a function over is defined to be the largest
value such that for all or if no such number exists
• This allows us to write
• Similarly,

14
Min vs. Inf
• Infimums are always well-defined even if the function is
unbounded from below
• The analogous concept for maximization is called the
supremum, i.e., the smallest upper bound
• For example,

• Note that
• As a result, we can always talk about minimizing functions
even if we want to maximize one

15
Optima Under Simple Constraints

• Given a differentiable function , how do you find its minimum on


?

16
Optima Under Simple Constraints

• Given a differentiable function , how do you find its minimum on


?
• The minimum of on can occur in one of three places:
• An such that
• At the lower boundary
• At the upper boundary

17
Optima Under Simple Constraints

• Given a differentiable function , how do you find its minimum on


?
• The minimum of on can occur in one of three places:
• An such that
• At the lower boundary
• At the upper boundary
• The optimization problem becomes much more challenging with
more complex constraints!

18
Multivariable Calculus Review

• Given a differentiable function , how do you find its global


minimum?

19
Vectors and Matrices
• A vector is a tuple of real numbers where for denotes the
element of the tuple
• Similarly, a matrix is indexed by a pair and
• All vectors will be treated as column vectors, i.e., they can be
thought of as matrices with rows and 1 column
• , denotes the transpose of , which is a row vector
• , then is the dot product between and

20
Partial Derivatives
• The partial derivative of a multivariate function with respect to
is the derivative of with respect to assuming that all of the
remaining variables are treated as constants

21
Multivariate Chain Rule
• Let , , , be differentiable functions, then

• Applies, in the same way, to functions of more than two


variables as well
• Example, , ,

22
Gradients
• The gradient of at is the vector of partial derivatives, denoted ),
whose entry is

23
Directional Derivatives
• Unlike functions of a single variable, there is not a single
derivative that describes the rate of change at the point

24
Directional Derivatives
• Unlike functions of a single variable, there is not a single
derivative that describes the rate of change at the point
• Consider
• Let
• , which is the slope of at the point
• This means that is the slope of the function when
considered only as a function of
• Equivalently, this is the slope of in the direction where
and all other entries of are zero

25
Directional Derivatives
• The directional derivative of a differentiable function at along
the vector is given by , where
• Equivalently, by the multivariate chain rule,

• Example: ,

26
Directional Derivatives
• For differentiable functions, the gradient gives the direction of
steepest ascent in
• It maximizes the directional derivative, , over all possible unit
length directions
• Recall that where and is the angle between the two
vectors (in the 2d-plane that contains them both)
• is unit length if
• So,
• Choosing attains this upper bound

27
Directional Derivatives

• If the directional derivative of a differentiable function at in the


direction is equal to zero for all , then we say that is a critical
point
• This can only happen when
• Local maxima and minima of differentiable functions can only
occur at places in which the gradient is zero

28
Local Optima
Critical points can be local minima, local maxima, or saddle points

29
Local Optima
Critical points can be local minima, local maxima, or saddle points

30 source: Wikipedia
Multivariable Calculus Review

• Given a differentiable function , how do you find its global


minimum?
• The global minimum can only occur in one of two places:
• An such that
• In the for some direction
• Note that we have to look along all possible directions in the
limiting cases

31
Hessian Matrix
• The Hessian is the matrix of second partial derivatives, denoted ,
whose entry is

• For twice differentiable functions, the Hessian describes the


local curvature of the function

32
Hessian Matrix
• The Hessian is the matrix of second partial derivatives, denoted ,
whose entry is

• For twice differentiable functions, the Hessian is always a


symmetric matrix, i.e.,

33
Constrained Optimization
• How do we minimize a multivariate function with respect to
simple constraints, e.g., minimize subject to and ?

34
Constrained Optimization
• How do we minimize a multivariate function with respect to
simple constraints, e.g., minimize subject to and ?
• This is much more challenging than the univariate case!
• The optima can occur at a critical point, , or anywhere on the
boundary of the rectangle determined by !

a b

35

You might also like