7.pca Mda

A Tutorial on Principal
Component Analysis (PCA)

and Multiple Discriminant
Analysis (MDA):
Dr. Rahul Das Gupta
1. Covariance Matrix
Given set of feature vectors   X (k )
 
n N
k 1
where, X ( k )  k th feature vector

(k ) T
  x , x ,..., x   n
(k )
1
(k )
2 n
xi( k )  i th component of the k th feature vector
n  dim ension of each feature vector
N  no. of feature vectors

N
1
Mean vector   E ( X ) 
N
 X
k 1
(k )
where,    1 , 2 ,..., n  
T n
Covariance Matrix   E ( X   )( X   )T 
   X    X   
1 N (k ) (k ) T
N k 1
 11  12 ... 1n 
 
  21  22 ... 2n  n n
 
... ... ... 
 
 n1  n 2 ... n n 
where,  ij  covariance between i th component and jth
component of sample feature vectors
=   xi  i  x (jk )   j 
1 N (k )
N k 1
and  ji   ij
 ii  variance between i th component of the feature vector
2
 xi( k )  i 
N
1
= 
N k 1
2. Dimension Reduction Procedure
using KL-Transform:
Step #1: Computation of Mean Vector  
n
nn
and Covariance Matrix  for a given
set of feature vector  X 
n N
(k )
 .
k 1
Here, N  no. of feature vectors

n  dim ension of each feature vector
X ( k )  k th feature vector
(k ) T
  x , x ,..., x   n
(k )
1
(k )
2 n
xi( k )  i th component of the k th feature vector

N
1
Mean vector   E ( X ) 
N
X
k 1
(k )
where,    1 , 2 ,..., n  
T n
Covariance Matrix   E ( X   )( X   )T 
   X    X   
1 N (k ) (k ) T
N k 1
 11  12 ... 1n 
 
  21  22 ... 2n  n n
 
... ... ... 
 
 n1  n 2 ... n n 
where,  ij  covariance between i th component and jth
component of sample feature vectors
=   xi  i  x (jk )   j 
1 N (k )
N k 1
and  ji   ij
 ii  variance between i th component of the feature vector
2
=   xi  i 
1 N (k )
N k 1
Step #2: Determination of the eigenvalues of
nn
the Covariance Matrix   in
descending order.
n n
The eigenvalues of the covariance matrix  
arranged in descending order are the following :
1  2  3  ...  n
where  eî  i eî for all i  1, 2,..., n
and eî  unit eigenvector of the covariance matrix
 n n
corresponding to eigenvalue i .
Step #3: Determination of the sequence of
unit eigenvectors
eˆ1 , eˆ2 , eˆ3 , ..., eˆ n
corresponding to descending sequence of
eigenvalues
1 , 2 , 3 ,..., n where, 1  2  3  ...  n .
Step #4: In order to reduce the dimension of
(k )
each feature vector X from n-dimension
to l-dimension where n>>l, the vector
X (k )
  is projected on the unit
eigenvectors
eˆ1 , eˆ2 , eˆ3 , ..., eˆl .
The feature vector of reduced dimension

will be as follows:
 ( X ( k )   )T eˆ1 
 (k ) 
 ( X   ) eˆ 2 
T
 (k ) 
 ( X   ) e3  ˆ
T
Y (k )  ... 

l
... 
 
 ... 
 ( X ( k )   )T eˆ 
 l

Theorem:
The best possible one-dimensional
representations of the set of n-dimensional
feature vectors can be obtained by projecting
the difference of the given vector and the
mean vector on the unit principal eigenvector
(the unit eigen vector corresponding to the
largest eigen value) of the covariance matrix
(  ) of the original feature vectors.
Proof:
Y (k )
   t eˆ  (k ) n
N 2 N 2
Total Error E   X (k )
Y (k )
 X (k )
 (   t eˆ ) (k )
k 1 k 1
N 2
  ( X ( k )   )  t ( k ) eˆ
k 1
N 2 N 2 N
 X (k )
   t (k ) 2
eˆ  2 t ( k ) eˆ T ( X ( k )   )
k 1 k 1 k 1
N 2 N N
 X (k )
   t (k ) 2
 2 t ( k ) eˆ T ( X ( k )   )
k 1 k 1 k 1
...(1)
 eˆ is an unit vector i.e. eˆ  1
E  N 2 N N 
 (k )  X     t  2 t eˆ ( X   ) 
(k ) (k ) 2 (k ) T (k )
t (k )
t  k 1 k 1 k 1 
 2t ( k )  2 eˆ T ( X ( k )   )
In order to minimise the total error E we must

have
E
 0  t (k )
 ˆ
e T
( X (k )
 )
t (k )
Hence, t ( k )  eˆ T ( X ( k )   )
 ( X ( k )   )T eˆ...(2)
Now from equation (1) and (2) we have,

N 2
E   X (k )  Y (k )
k 1
N 2 N
 X (k )
   t (k ) 2
...(3)
k 1 k 1
From (2) and (3) we get,

N 2 N
E X (k )
  t (k ) 2
k 1 k 1
N 2 N
 X (k )
   eˆ T ( X ( k )   )( X ( k )   )T eˆ
k 1 k 1
 N 
N 2
 X (k )
  eˆ   ( X ( k )   )( X ( k )   )T  eˆ
T
k 1  k 1 
N 2
  X (k )    eˆ T (N ) eˆ
k 1
 Co var iance matrix 

 
  1  N

T 
 
N  k 1
( X (k )
  )( X (k )
  ) 

N 2
 X (k )
  N( eˆ T  eˆ ) ...(4)
k 1
As the first part of the equation (4) is always

N 2
 X (k )
 0
greater than or equal to zero i.e., , k 1
the optimisation problem takes the following

form:
Minimisation of E  Maximisation of eˆ T  eˆ
subjected to the constraint
eˆ  1 i.e., eˆ T eˆ  1
Hence, the Lagrangian will be the following:
L( eˆ )  eˆ T  eˆ   ( eˆ T eˆ  1)
In order to maximise L( eˆ ) with respect to the
unit vector ê we must have

eˆ L( eˆ )  0 ( Null vector )
Now
 eˆ L( eˆ )   eˆ  eˆ T  eˆ   ( eˆ T eˆ  1) 
 2 eˆ  2 eˆ
Hence,
 eˆ L( eˆ )  0 ( Null vector )
  eˆ   eˆ
Thus,  and eˆ are eigenvalue and

eigenvector pair of the covariance matrix  .
Now,
eˆ T  eˆ  eˆ T ( eˆ )
 eˆ T ( eˆ )   eˆ   eˆ 
  ( eˆ T eˆ )

 eˆ  eˆ   max
T
Hence, max
We have to select the maximum eigenvalue
(max ) 
of the covariance matrix .
Thus
ê will be the eigenvector of the
covariance matrix  corresponding to the
(max )
maximum eigenvalue .
Thus, the best possible one-dimensional

representation of the n-dimensional feature
(k ) T
vector X
(k )
  x , x ,..., x  
(k )
1
(k )
2 n
n
will be as follows:
t (k)  eˆ T ( X ( k )   )  ( X ( k )   )T eˆ 
3. Multiple Discriminant Analysis
(MDA):
Motivation
Principal Component Analysis (PCA)
does not take class label information
into account. PCA may remove the
components which are discriminatory.
Example
If PCA is applied to set of feature vectors
representing two types of characters ‘O’
and ‘Q’ then the tail portion of ‘Q’ will be
removed, which the discriminatory
component between ‘O’ and ‘Q’.
Hence, for classification-based
applications discriminatory component
must be preserved after dimensionality
reduction.
 
N
Training Set T  X (i ) , y (i )
i 1
where, N  no. of training sample

X (i )  feature vector of i th sample  n
y (i )  class label of i th sample

k  no. of classes
Let us denoted these classes as C1 , C2 , C3 , ..., Ck respectively.
N i  no. of training samples belong to class Ci
k
N  total no. of training samples   N i
i 1
1
mi  centre of i th class 
Ni

X C
X n
i
k
1
m  grand mean 
N

i 1
Nm  i i
n
4. Dimension Reduction Procedure
using Multiple Discriminant Analysis
(MDA):
X 
N
Given Training Set:

(i )
 n
,y (i )
i 1
where,
X ( i )  feature vector of i th sample  n
y ( i )  class label of i th sample

k  no. of classes
Let us denoted these classes as C1 , C2 , C3 , ..., Ck respectively.
N i  no. of training samples belong to class Ci
k
N  total no. of training samples   N i
i1
Step #1: Compute of the Mean Vector

(centre of i class) mi 
th n
for all the classes
C1 , C2 , C3 , ..., Ck and the Grand Mean
n
Vector m  for a given training set
X 
N
(i )
 n
, y (i )
in the following way.
i 1
1
mi  centre of i th class 
Ni

X C
X n
i
k
1
m  grand mean 
N

i 1
Nm 
i i
n
Step #2: Determination of the Within Class

Scattering or (Intra Class Scattering) SW
and Between Class Scattering or (Inter
SB
Class Scattering) in the following way:
k
SW     X  mi  X  mi 
T
i 1 X Ci
k
and S B   N i  m i  m  m i  m 
T
i 1
Step #3: Determination of the eigenvalues of
1 nn
the discrimination matrix W S B 
S in the
descending order.
Step #4: Determination of the unit

eigenvectors of the discrimination matrix
S W1S B
corresponding to the descending
sequence of eigenvalues
1 , 2 , 3 ,..., n where, 1  2  3  ...  n .
The eigenvalues of the discrimination matrix
S W1S B  n n
arranged in descending order are
the following :
1  2  3  ...  n
where ( S W1S B )eî  i eî for all i  1, 2,..., n
and eî  unit eigenvector of the discrimination
matrix S W1S B  n n
corresponding to
eigenvalue i .
Step #5: In order to reduce the dimension of
(k )
each feature vector X from n-dimension
to l-dimension where n>>l, the vector
 X (k )  m is projected on the unit
eigenvectors
eˆ1 , eˆ2 , eˆ3 , ..., eˆl .
The feature vector of reduced dimension

will be as follows:
( X ( k )  m)T eˆ1 
 (k ) 
 ( X  m )T
ˆ
e 2
( X ( k )  m)T eˆ 
 3
Y (k )  ...  l
 
 ... 
... 
 (k ) 
( X  m) eˆl 
T
Theorem:
The best direction corresponding to
discrimination is along the unit eigenvector
Ŵ  of the discrimination matrix S W
1
SB
corresponding to its highest eigen value
 max  .
Proof:
Feature vectors X (i )  n i  1, 2,3,..., N are now
projected to some direction corresponding to
the unit vector Wˆ  to determine the most
n
discriminatory feature between classes. Let the

feature vectors transformed to the following
set of scalers Y (i)  Wˆ T X (i)  i  1, 2,3,..., N .
1
mi  centre of i th class in the transformed domain  
Ni YCi
Y
1 k
m  grand mean in the transformed domain   Ni mi 
N i 1
1 1 
T 1

mi   N Y   ˆ ˆ
W X W 
T
 X   Wˆ T mi 
Ni YCi i X Ci  Ni X Ci 
  ˆT
 
1 k 1 k 1 k
m   Ni mi   N i W mi  W   N i mi   W m 
ˆ T ˆ T
N i 1 N i 1  N i 1 
k
SW  within cluster scattering     X  mi  X  mi 
T
i 1 X Ci
k
S B  between cluster scattering   Ni  mi  m  mi  m 
T
i 1
Within Class Scattering or

Intra Class Scattering
1. Within Class Scattering or Intra Class

Scattering in original feature space:
k
SW     X  mi  X  mi 
T
i 1 X Ci
2. Within Class Scattering or Intra Class

Scattering in the transformed feature
space:
k
SW    Y  mi Y  mi 
T
i 1 Y Ci
  
k

T
Wˆ T X  Wˆ T mi Wˆ T X  Wˆ T mi
i 1 X Ci
 k T
 Wˆ     X  mi  X  mi   Wˆ
T
 i 1 X Ci 
 Wˆ T
S Wˆ
W
Small Within Class Scattering or (Intra

Class Scattering) will make the classes
compact.
Within Class Scattering or (Intra Class
Scattering) should be as small as possible.
Between Class Scattering or
Inter Class Scattering
1. Between Class Scattering or Inter Class

Scattering in original feature space:
k
S B   N i  mi  m  m i  m 
T
i 1
2. Between Class Scattering or Inter Class

Scattering in the transformed feature
space:
k
S B   Ni  mi  m  mi  m 
T
i 1
  
k
  Ni Wˆ T mi  Wˆ T m Wˆ T mi  Wˆ T m
T
i 1
 k
T
 Wˆ   N i  mi  m  mi  m   Wˆ
T
 i 1 
 Wˆ T S Wˆ B
Large Between Class Scattering or (Inter

Class Scattering) will make the classes well
separable.
Between Class Scattering or (Inter Class
Scattering) should be as large as possible.
The ratio of Between Class Scattering or

(Inter Class Scattering) and Within Class
Scattering or (Intra Class Scattering)
should be as large as possible so that classes
should be compact and well separable. i
S Wˆ T
S Wˆ
J (Wˆ )  B  T B ...(1)
SW Wˆ SW Wˆ
Hence, we have to maximise
 Wˆ T S B Wˆ 
Wˆ  T
ˆ ˆ   0  Null vector 
 W SW W 

    
Wˆ T SW Wˆ Wˆ Wˆ T S B Wˆ  Wˆ T S B Wˆ Wˆ Wˆ T SW Wˆ
0
  
 
2
Wˆ SW Wˆ
T
   
 Wˆ T SW Wˆ 2S B Wˆ  Wˆ T S B Wˆ 2SW Wˆ  0
 Wˆ S Wˆ  S Wˆ  Wˆ S Wˆ  S
T
W B
T
B W Wˆ
 S Wˆ 
Wˆ S Wˆ T
S Wˆ
B
B
Wˆ S Wˆ T
W
W
 S B Wˆ   SW Wˆ ...(2)
where  
Wˆ T S B Wˆ   some scaler
Wˆ T
SW Wˆ 
  SW 1S B Wˆ  Wˆ

The matrix SW 1S B  nn is known as discrimination
matrix.
Hence,  and Wˆ are eigenvalue  unit egenvector pair
Now J (Wˆ ) 
Wˆ T
S B Wˆ  
Wˆ T
SW Wˆ 
 J max (Wˆ )  max  Maximum eigenvalue of  SW 1S B 

Thus, the best choice of Ŵ would be
the unit eigenvector corresponding to
the maximum eigenvalue of the
1
matrix W SB .
S

7.pca Mda

Uploaded by

Copyright:

Available Formats

7.pca Mda

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

7.pca Mda

Uploaded by

Copyright:

Available Formats

A Tutorial on Principal

Component Analysis (PCA)

where, X ( k )  k th feature vector

xi( k )  i th component of the k th feature vector

n  dim ension of each feature vector

N  no. of feature vectors

Covariance Matrix   E ( X   )( X   )T 

Here, N  no. of feature vectors

xi( k )  i th component of the k th feature vector

Covariance Matrix   E ( X   )( X   )T 

Step #3: Determination of the sequence of

The feature vector of reduced dimension

In order to minimise the total error E we must

Now from equation (1) and (2) we have,

From (2) and (3) we get,

 Co var iance matrix 

As the first part of the equation (4) is always

the optimisation problem takes the following

In order to maximise L( eˆ ) with respect to the

unit vector ê we must have

Thus,  and eˆ are eigenvalue and

Thus, the best possible one-dimensional

where, N  no. of training sample

y (i )  class label of i th sample

Given Training Set:

y ( i )  class label of i th sample

Step #1: Compute of the Mean Vector

Step #2: Determination of the Within Class

Step #4: Determination of the unit

The feature vector of reduced dimension

discriminatory feature between classes. Let the

Within Class Scattering or

1. Within Class Scattering or Intra Class

2. Within Class Scattering or Intra Class

Small Within Class Scattering or (Intra

1. Between Class Scattering or Inter Class

2. Between Class Scattering or Inter Class

Large Between Class Scattering or (Inter

The ratio of Between Class Scattering or

  SW 1S B Wˆ  Wˆ

 J max (Wˆ )  max  Maximum eigenvalue of  SW 1S B 

You might also like