Vector and Matrix Calculus: Herman Kamper 30 January 2013

Vector and Matrix Calculus
Herman Kamper
[email protected]
30 January 2013
1 Introduction
As explained in detail in [1], there unfortunately exists multiple competing notations concerning
the layout of matrix derivatives. This can cause a lot of difficulty when consulting several
sources, since different sources might use different conventions. Some sources, for example [2]
(from which I use a lot of identities), even use a mixed layout (according to [1, Notes]). Identities
for both the numerator layout (sometimes called the Jacobian formulation) and the denominator
layout (sometimes called the Hessian formulation) is given in [1], so this makes it easy to check
what layout a particular source uses. I will aim to stick to the denominator layout, which seems
to be the most widely used in the field of statistics and pattern recognition (e.g. [3] and [4,
pp. 327–332]). Other useful references concerning matrix calculus include [5] and [6]. In this
document column vectors are assumed in all cases expect where specifically stated otherwise.
Table 1: Derivatives of scalars, vector functions and matrices [1, 6].
column vector
scalar y matrix Y ∈ Rm×n
y ∈ Rm
∂y row vector matrix ∂Y
∂x (only
scalar x scalar ∂y m
∂x ∈ R
∂x numerator layout)
column vector matrix
column vector x ∈ Rn ∂y n ∂y
∂x ∈ R ∂x ∈ Rn×m
∂y
matrix X ∈ Rp×q matrix ∂X ∈ Rp×q
2 Definitions
Table 1 indicates the six possible kinds of derivatives when using the denominator layout. Using
this layout notation consistently, we have the following definitions.
The derivative of a scalar function f : Rn → R with respect to vector x ∈ Rn is
 
∂f (x)
 ∂x1 
 ∂f (x) 
∂f (x) def  ∂x2 
=  .. 
 (1)
∂x  . 
 
∂f (x)
∂xn
This is the transpose of the gradient (some authors simply call this the gradient, irrespective of
whether numerator or denominator layout is used).
1
T
The derivative of a vector function f : Rn → Rm , where f (x) = f1 (x) f2 (x) . . . fm (x)

and x ∈ Rn , with respect to scalar xi is
∂f (x) def h ∂f1 (x) ∂f2 (x) ∂fm (x)

i
= ∂xi ∂xi ... ∂xi
(2)
∂xi
T
The derivative of a vector function f : Rn → Rm , where f (x) = f1 (x) f2 (x) . . . fm (x) ,

with respect to vector x ∈ Rn is

   
∂f (x) ∂f1 (x) ∂f2 (x) ∂fm (x)
. . .
 ∂x1   ∂x1 ∂x1 ∂x1 
 ∂f (x)   ∂f1 (x) ∂f2 (x) ∂fm (x) 
∂f (x) def  ∂x2  ...
 =  ∂x2 ∂x2 ∂x2 

=  ..   .. .. .. .. 
 (3)
∂x  .   . . . . 
   
∂f (x) ∂f1 (x) ∂f2 (x) ∂fm (x)
∂xn ∂xn ∂xn ... ∂xn
This is just the transpose of the Jacobian matrix.

The derivative of a scalar function f : Rm×n → R with respect to matrix X ∈ Rm×n is
 
∂f (X) ∂f (X) ∂f (X)
· · ·
 ∂X11 ∂X12 ∂X1n 
 ∂f (X) ∂f (X) ∂f (X) 
∂f (X) def  ∂X21 ∂X22 · · · ∂X2n 
=  .. .. .. .. 
 (4)
∂X  . . . . 
 
∂f (X) ∂f (X) ∂f (X)
∂Xm1 ∂Xm2 ··· ∂Xmn
Observe that the (1) is just a special case of (4) for column vectors. Often (as in [3]) the gradient
notation is used as an alternative to the notation used above, for example:
∂f (x)
∇x f (x) = (5)
∂x
∂f (X)
∇X f (X) = (6)
∂X
3 Identities
3.1 Scalar-by-vector product rule
If a ∈ Rm , b ∈ Rn and C ∈ Rm×n then

 
m
X m
X n
X m X
X n
aT Cb = ai (Cb)i = ai  Cij bj  = Cij ai bj (7)
i=1 i=1 j=1 i=1 j=1
Now assume we have vector functions u : Rm → Rm , v = Rn → Rn and A ∈ Rm×n . The vector

functions u and v are functions of x ∈ Rq , but A is not. We want to find an identity for
∂uT Av
(8)
∂x
2
From (7), we have:
T m n
∂uT Av

∂u Av ∂ XX
= = Aij ui vj
∂x l ∂xl ∂xl
i=1 j=1
m X
n
X ∂
= Aijui vj
∂xl
i=1 j=1
m X n
X ∂ui ∂vj
= Aij vj + ui
∂xl ∂xl
i=1 j=1
m X
n m n
X ∂ui X X ∂vj
= Aij vj + Aij ui (9)
∂xl ∂xl
i=1 j=1 i=1 j=1
Now we can show (by writing out the elements [Notebook, 2012-05-22]) that:
m Xn m n
∂u ∂v T X ∂ui X X T ∂vj
Av + A u = Aij vj + (A )ji ui
∂x ∂x l ∂xl ∂xl
i=1 j=1 i=1 j=1
m X n m X n
X ∂ui X ∂vj
= Aij vj + Aij ui (10)
∂xl ∂xl
i=1 j=1 i=1 j=1
A comparison of (9) and (10) completes the proof that
∂uT Av ∂u ∂v T
= Av + A u (11)
∂x ∂x ∂x
3.2 Useful identities from scalar-by-vector product rule
From (11) it follows, with vectors and matrices b ∈ Rm , d ∈ Rq , x ∈ Rn , B ∈ Rm×n , C ∈ Rm×q ,

D ∈ Rq×n , that
∂(Bx + b)T C(Dx + d) ∂(Ax + b) ∂(Dx + d)T T
= C(Dx + d) + C (Bx + b) (12)
∂x ∂x ∂x
resulting in the identity:
∂(Bx + b)T C(Dx + d)

= BT C(Dx + d) + DT CT (Bx + b) (13)
∂x
by using the easily verifiable identities:
∂(u(x) + v(x)) ∂u(x) ∂v(x)

= + (14)
∂x ∂x ∂x
∂Ax
= AT (15)
∂x
∂a
=0 (16)
∂x
Some other useful special cases of (11):
∂xT Ab
= Ab (17)
∂x
3
∂xT Ax
= (A + AT )x (18)
∂x
∂xT Ax
= 2Ax if A is symmetric (19)
∂x
3.3 Derivatives of determinant
See [7, p. 374] for definition of cofactors. Also see [Notebook, 2012-05-22].
We can write the determinant of matrix X ∈ Rn×n as
n
X
|X| = Xi1 Ci1 + Xi2 Ci2 + . . . + Xin Cin = Xij Cij+ (20)
j=1
Thus the derivative will be

∂|X| ∂
= {Xi1 Ci1 + Xi2 Ci2 + . . . + Xin Cin }
∂X kl ∂Xkl
∂
= {Xk1 Ck1 + Xk2 Ck2 + . . . + Xkn Ckn }
∂Xkl
(can choose i any number, so choose i = k)
= Ckl (21)
Thus (see [7, p. 386])

∂|X|
= cofactor X = (adj X)T (22)
∂X
But we know that the inverse of X is given by [7, p. 387]
1
X−1 = adj X (23)
|X|
thus
adj X = |X|X−1 (24)
which, when substituted into (22), results in the identity
∂|X|
= |X|(X−1 )T (25)
∂X
From (25) we can also write

∂ ln |X| ∂ ln |X| 1 ∂|X| 1
= = = |X|(X−1 )T (26)
∂X kl ∂Xkl |X| ∂X |X|
giving the identity
∂ ln |X|
= (X−1 )T (27)
∂X
4
References
[1] Matrix calculus. [Online]. Available: http://en.wikipedia.org/wiki/Matrix calculus

[2] K. B. Petersen and M. S. Pedersen, “The matrix cookbook,” 2008.
[3] A. Ng, Machine Learning. Class notes for CS229, Stanford Engineering Everywhere,
Stanford University, 2008. [Online]. Available: http://see.stanford.edu
[4] S. R. Searle, Matrix Algebra Useful for Statistics. New York, NY: John Wiley & Sons, 1982.
[5] J. R. Schott, Matrix Analysis for Statistics. New York, NY: John Wiley & Sons, 1996.
[6] T. P. Minka, “Old and new matrix algebra useful for statistics,” 2000. [Online]. Available:
http://research.microsoft.com/en-us/um/people/minka/papers/matrix
[7] D. G. Zill and M. R. Cullen, Advanced Engineering Mathematics, 3rd ed. Jones and Bartlett,
2006.

Vector and Matrix Calculus: Herman Kamper 30 January 2013

Uploaded by

Copyright:

Available Formats

Vector and Matrix Calculus: Herman Kamper 30 January 2013

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Vector and Matrix Calculus: Herman Kamper 30 January 2013

Uploaded by

Copyright:

Available Formats

Vector and Matrix Calculus

and x ∈ Rn , with respect to scalar xi is

∂f (x) def h ∂f1 (x) ∂f2 (x) ∂fm (x)

with respect to vector x ∈ Rn is

This is just the transpose of the Jacobian matrix.

3.1 Scalar-by-vector product rule

If a ∈ Rm , b ∈ Rn and C ∈ Rm×n then

Now assume we have vector functions u : Rm → Rm , v = Rn → Rn and A ∈ Rm×n . The vector

A comparison of (9) and (10) completes the proof that

3.2 Useful identities from scalar-by-vector product rule

From (11) it follows, with vectors and matrices b ∈ Rm , d ∈ Rq , x ∈ Rn , B ∈ Rm×n , C ∈ Rm×q ,

∂(Bx + b)T C(Dx + d)

∂(u(x) + v(x)) ∂u(x) ∂v(x)

Some other useful special cases of (11):

3.3 Derivatives of determinant

Thus the derivative will be

Thus (see [7, p. 386])

From (25) we can also write

giving the identity

[1] Matrix calculus. [Online]. Available: http://en.wikipedia.org/wiki/Matrix calculus

You might also like