Lecture 1 Gradients and More

CS 4301
Numerical Methods in Machine Learning

and Data Science
Nicholas Ruozzi
University of Texas at Dallas
Course Info.
• Instructor: Nicholas Ruozzi
• Office: ECSS 3.409 (online this semester)
• Office hours: M 11:30-12:30, W 12:30pm-1:30pm
• TA: TBD
• Office hours and location: TBD
• Course website: www.utdallas.edu/~nicholas.ruozzi/cs4301/2020fa/

• Book: none required
• Piazza (online forum): sign-up link on eLearning
2
Prerequisites
• CS3345, Data Structures and Algorithms

• “Mathematical sophistication”
• Linear algebra: eigenvalues/vectors, matrices, vectors,
etc.
• Multivariate calculus: derivatives, gradients, etc.
• Take prerequisite “quiz” on eLearning (coming soon)
3
Grading
• 5-6 problem sets (70%)
• See collaboration policy on the web
• Mix of theory and programming (in Python)
• Available and turned in on eLearning
• Approximately one assignment every two weeks

• 2 Exams (30%)
• Attendance policy?
-subject to change-
4
What this course is about...
• Problem solving using techniques from continuous optimization
• Growth of AI, ML, and Data Science means that you are likely
to encounter these problems in the real-world
• Understand how continuous optimization techniques work and
how to interpret their outputs
• Emphasis is on understanding how to formulate problems and
apply existing tools to (approximately) solve them
• We’ll see a bit of the theory behind these methods too, but
the emphasis will be on applying them
5
Course Topics
Part 1: Numerical Optimization
• Unconstrained optimization (partial derivatives, gradients, local versus global
optimality)
• Convex functions and gradient descent
• Linear programming
• Constrained convex optimization problems
• Method of Lagrange multipliers
• Projected gradient methods
• Frank-Wolfe/Method of conditional gradients
Part 2: Numerical Methods in Linear Algebra

• Linear algebra review (matrices, vectors, dot products, inverses, rank)
• Least squares regression (via gradient descent)
• Matrix and vector norms
• Eigenvalues and eigenvectors (Courant-Fisher Theorem and power iteration)
• Singular value decomposition (SVD as a convex optimization problem)
• Nonnegative matrix factorization (CUR 6decomposition, convex combinations)
Today: Calculus Refresher
7
Univariate Calculus Review
• Given a differentiable function , how do you find its global

minimum?
8

minimum?
9

minimum?
• The global minimum can only occur in one of three places:
• An such that
• In the
• In the
10

minimum?
• The global minimum can only occur in one of three places:
• An such that
• In the
• In the
• Note that in the last two cases, the global minimum need not
be achievable at any finite
11
Simple Derivatives
• To find the places where the derivative is equal to zero, we can
employ the standard derivative rules from calculus:
• for any
12
Min vs. Inf
• The global minimum may also not exist!
• For example, consider the function
13
Min vs. Inf
• The global minimum may also not exist!
• For example, consider the function
• This function has no finite minimum
• In this case, it doesn’t make sense to write
• The infimum of a function over is defined to be the largest
value such that for all or if no such number exists
• This allows us to write
• Similarly,
14
Min vs. Inf
• Infimums are always well-defined even if the function is
unbounded from below
• The analogous concept for maximization is called the
supremum, i.e., the smallest upper bound
• For example,
• Note that
• As a result, we can always talk about minimizing functions
even if we want to maximize one
15
Optima Under Simple Constraints
• Given a differentiable function , how do you find its minimum on

?
16

?
• The minimum of on can occur in one of three places:
• An such that
• At the lower boundary
• At the upper boundary
17

?
• The minimum of on can occur in one of three places:
• An such that
• At the lower boundary
• At the upper boundary
• The optimization problem becomes much more challenging with
more complex constraints!
18
Multivariable Calculus Review

minimum?
19
Vectors and Matrices
• A vector is a tuple of real numbers where for denotes the
element of the tuple
• Similarly, a matrix is indexed by a pair and
• All vectors will be treated as column vectors, i.e., they can be
thought of as matrices with rows and 1 column
• , denotes the transpose of , which is a row vector
• , then is the dot product between and
20
Partial Derivatives
• The partial derivative of a multivariate function with respect to
is the derivative of with respect to assuming that all of the
remaining variables are treated as constants
21
Multivariate Chain Rule
• Let , , , be differentiable functions, then
• Applies, in the same way, to functions of more than two

variables as well
• Example, , ,
22
Gradients
• The gradient of at is the vector of partial derivatives, denoted ),
whose entry is
23
Directional Derivatives
• Unlike functions of a single variable, there is not a single
derivative that describes the rate of change at the point
24
• Unlike functions of a single variable, there is not a single
derivative that describes the rate of change at the point
• Consider
• Let
• , which is the slope of at the point
• This means that is the slope of the function when
considered only as a function of
• Equivalently, this is the slope of in the direction where
and all other entries of are zero
25
• The directional derivative of a differentiable function at along
the vector is given by , where
• Equivalently, by the multivariate chain rule,
• Example: ,
26
• For differentiable functions, the gradient gives the direction of
steepest ascent in
• It maximizes the directional derivative, , over all possible unit
length directions
• Recall that where and is the angle between the two
vectors (in the 2d-plane that contains them both)
• is unit length if
• So,
• Choosing attains this upper bound
27
• If the directional derivative of a differentiable function at in the

direction is equal to zero for all , then we say that is a critical
point
• This can only happen when
• Local maxima and minima of differentiable functions can only
occur at places in which the gradient is zero
28
Local Optima
Critical points can be local minima, local maxima, or saddle points
29
Local Optima
Critical points can be local minima, local maxima, or saddle points
30 source: Wikipedia
Multivariable Calculus Review

minimum?
• The global minimum can only occur in one of two places:
• An such that
• In the for some direction
• Note that we have to look along all possible directions in the
limiting cases
31
Hessian Matrix
• The Hessian is the matrix of second partial derivatives, denoted ,
whose entry is
• For twice differentiable functions, the Hessian describes the

local curvature of the function
32
Hessian Matrix
• The Hessian is the matrix of second partial derivatives, denoted ,
whose entry is
• For twice differentiable functions, the Hessian is always a

symmetric matrix, i.e.,
33
Constrained Optimization
• How do we minimize a multivariate function with respect to
simple constraints, e.g., minimize subject to and ?
34
Constrained Optimization
• How do we minimize a multivariate function with respect to
simple constraints, e.g., minimize subject to and ?
• This is much more challenging than the univariate case!
• The optima can occur at a critical point, , or anywhere on the
boundary of the rectangle determined by !
a b
35

Lecture 1 Gradients and More

Uploaded by

Copyright:

Available Formats

Lecture 1 Gradients and More

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 1 Gradients and More

Uploaded by

Copyright:

Available Formats

CS 4301

Numerical Methods in Machine Learning

• Office hours: M 11:30-12:30, W 12:30pm-1:30pm

• Course website: www.utdallas.edu/~nicholas.ruozzi/cs4301/2020fa/

• CS3345, Data Structures and Algorithms

• Mix of theory and programming (in Python)

• Available and turned in on eLearning

• Approximately one assignment every two weeks

Part 2: Numerical Methods in Linear Algebra

• Given a differentiable function , how do you find its global

• Given a differentiable function , how do you find its global

• Given a differentiable function , how do you find its global

• Given a differentiable function , how do you find its global

• Given a differentiable function , how do you find its minimum on

• Given a differentiable function , how do you find its minimum on

• Given a differentiable function , how do you find its minimum on

• Given a differentiable function , how do you find its global

• Applies, in the same way, to functions of more than two

• If the directional derivative of a differentiable function at in the

• Given a differentiable function , how do you find its global

• For twice differentiable functions, the Hessian describes the

• For twice differentiable functions, the Hessian is always a

You might also like