Matrix Calculus4ml
Matrix Calculus4ml
Matrix Calculus4ml
LI Xiucheng
Background
Matrix Derivative
1
Notation
We denote
2
Background
Vector and Matrix Product
• hX , Y i = tr(X > Y ) = m
P Pn
i=1 j=1 Xij Yij .
Remark:
3
Properties of Frobenius inner product
• hX , Y i = hY , X i.
• haX , Y i = hX , aY i = ahX , Y i.
• hX + Z , Y i = hX , Y i + hZ , Y i.
• hX , Y Z i = hX Y , Z i.
4
Properties of Frobenius inner product
Remark
• The first two equations can be summarized as moving left to left by transposing.
• The last two equations can be summarized as moving right to right by transposing.
5
Properties of Frobenius inner product
Proof.
The first two equations are pretty obvious by using the definition of inner product; the
last two equations use the fact that tr(XY ) = tr(YX ) holds for any two matrices X , Y
such that X > has the same size with Y .
6
Matrix Derivative
Matrix Derivative
Let us denote f = f (X ) ∈ R.
First, consider a scalar x, we have
df = f 0 (x)dx (1)
7
Matrix Differentiation Rules
Matrix Differentiation Rules
8
Method
The key idea is to use the properties of inner product and the matrix differentiation
rules to obtain the inner product form
df = h∇X f , dX i .
9
Machine Learning Examples
Quadratic Function Optimization
Hence,
∇x f = Ax + A> x.
10
Linear Regression
f (w) = hX w − y, X w − yi.
Hence,
∇w f = 2X > (X w − y).
And
∇w f = 0 =⇒ w∗ = (X > X )−1 X > y.
11
Softmax Regression
12
Softmax Regression
1 PN
−1
f (Σ) = log |Σ| + N i=1 xi − µ, Σ (xi − µ) . The first term
14
Estimating the Covariance of Gaussian Distribution
PN
Let S = 1
N i=1 (xi − µ)(xi − µ)> , then
Hence,
∇Σ f = (Σ−1 − Σ−1 SΣ−1 )> .
15