Vector and Matrix Calculus: Herman Kamper 30 January 2013
Vector and Matrix Calculus: Herman Kamper 30 January 2013
Vector and Matrix Calculus: Herman Kamper 30 January 2013
Herman Kamper
[email protected]
30 January 2013
1 Introduction
As explained in detail in [1], there unfortunately exists multiple competing notations concerning
the layout of matrix derivatives. This can cause a lot of difficulty when consulting several
sources, since different sources might use different conventions. Some sources, for example [2]
(from which I use a lot of identities), even use a mixed layout (according to [1, Notes]). Identities
for both the numerator layout (sometimes called the Jacobian formulation) and the denominator
layout (sometimes called the Hessian formulation) is given in [1], so this makes it easy to check
what layout a particular source uses. I will aim to stick to the denominator layout, which seems
to be the most widely used in the field of statistics and pattern recognition (e.g. [3] and [4,
pp. 327–332]). Other useful references concerning matrix calculus include [5] and [6]. In this
document column vectors are assumed in all cases expect where specifically stated otherwise.
Table 1: Derivatives of scalars, vector functions and matrices [1, 6].
column vector
scalar y matrix Y ∈ Rm×n
y ∈ Rm
∂y row vector matrix ∂Y
∂x (only
scalar x scalar ∂y m
∂x ∈ R
∂x numerator layout)
column vector matrix
column vector x ∈ Rn ∂y n ∂y
∂x ∈ R ∂x ∈ Rn×m
∂y
matrix X ∈ Rp×q matrix ∂X ∈ Rp×q
2 Definitions
Table 1 indicates the six possible kinds of derivatives when using the denominator layout. Using
this layout notation consistently, we have the following definitions.
The derivative of a scalar function f : Rn → R with respect to vector x ∈ Rn is
∂f (x)
∂x1
∂f (x)
∂f (x) def ∂x2
= ..
(1)
∂x .
∂f (x)
∂xn
This is the transpose of the gradient (some authors simply call this the gradient, irrespective of
whether numerator or denominator layout is used).
1
T
The derivative of a vector function f : Rn → Rm , where f (x) = f1 (x) f2 (x) . . . fm (x)
T
The derivative of a vector function f : Rn → Rm , where f (x) = f1 (x) f2 (x) . . . fm (x) ,
Observe that the (1) is just a special case of (4) for column vectors. Often (as in [3]) the gradient
notation is used as an alternative to the notation used above, for example:
∂f (x)
∇x f (x) = (5)
∂x
∂f (X)
∇X f (X) = (6)
∂X
3 Identities
∂uT Av
(8)
∂x
2
From (7), we have:
T m n
∂uT Av
∂u Av ∂ XX
= = Aij ui vj
∂x l ∂xl ∂xl
i=1 j=1
m X
n
X ∂
= Aijui vj
∂xl
i=1 j=1
m X n
X ∂ui ∂vj
= Aij vj + ui
∂xl ∂xl
i=1 j=1
m X
n m n
X ∂ui X X ∂vj
= Aij vj + Aij ui (9)
∂xl ∂xl
i=1 j=1 i=1 j=1
Now we can show (by writing out the elements [Notebook, 2012-05-22]) that:
m Xn m n
∂u ∂v T X ∂ui X X T ∂vj
Av + A u = Aij vj + (A )ji ui
∂x ∂x l ∂xl ∂xl
i=1 j=1 i=1 j=1
m X n m X n
X ∂ui X ∂vj
= Aij vj + Aij ui (10)
∂xl ∂xl
i=1 j=1 i=1 j=1
∂uT Av ∂u ∂v T
= Av + A u (11)
∂x ∂x ∂x
∂Ax
= AT (15)
∂x
∂a
=0 (16)
∂x
∂xT Ab
= Ab (17)
∂x
3
∂xT Ax
= (A + AT )x (18)
∂x
∂xT Ax
= 2Ax if A is symmetric (19)
∂x
See [7, p. 374] for definition of cofactors. Also see [Notebook, 2012-05-22].
We can write the determinant of matrix X ∈ Rn×n as
n
X
|X| = Xi1 Ci1 + Xi2 Ci2 + . . . + Xin Cin = Xij Cij+ (20)
j=1
thus
adj X = |X|X−1 (24)
which, when substituted into (22), results in the identity
∂|X|
= |X|(X−1 )T (25)
∂X
∂ ln |X|
= (X−1 )T (27)
∂X
4
References