PCA Dr. Pawan Kumar Tiwari

PCA
By
Dr. Pawan Kumar Tiwari

Component Analysis
Approach to coping with the problem of excessive
dimensionality is to reduce the dimensionality by
combining features.
 Linear combination are particularly attractive
because they are simple to compute and analytically
tractable.
 Linear methods project the high dimensional data
into lower dimensional space.
 There are two classical approaches to find effective
linear transformation.
Cont…
PCA & MDA
Principal Component Analysis (PCA)-Seeks a
projection that best represent the data in least
square sense.
Multiple Discriminant Analysis (MDA)-Seeks a
projection that best separates the data in least
square sense.
Principal Component Analysis (PCA)-
Can u think of any method to represent all of the

vectors in a set of n dimensional samples
x1 ,..., xn by a single vector x0 ?
Cont..
Let us think we want to find a vector x0 such
that the sum of the squared distances
between x0 and various xk is as small as
possible
Let us define the Squared-Error Criterion
function J 0 ( x0 ) by
n

2
J 0 ( x0 )  x0  xk    (1)
k 1
and seek the value of x0 that minimizes J 0

Cont..
• What will be the value of x0 that will minimize
J 0 given in equation (1)?
Cont..
Solution to the problem is x0  m, where m is the
sample mean
n
1
m
n
x
k 1
k     ( 2)
Visualization of Solution
• Equationn(1) can also be written as
J 0 ( x0 )   ( x  m)  ( x  m)
k 1
0 k
2
n n n
  (x  m)  2 (x  m) (x  m)   (x  m)
k 1
0
2
k 1
0
T
k
k 1
k
2
n n n
 
k 1
2
( x0  m)  2( x0  m)T 
k 1
( xk  m )  
k 1
( xk  m )
2
         (3)
Cont..
 This sample mean is zero dimensional representation of the data
set.
 It is simple but it does not reveal any of the variability in the data.
 We can obtain a more interesting, one dimensional
representation by projecting the data onto a line running through
the sample mean.
 Let e be the unit vector in the direction of line. Then the equation
of line can be written as
x  m  ae      ( 4)
where the scalar a corresponds to the distance of any point x from
the mean m.
Cont..
If we represent xk by m  ak e we can
find an “optimal” set of coefficient ak by
minimizing the squared-error criterion
function.
n n
J1 (a1 ,..., an , e)   (m  ak e)  xk  ak e  ( xk  m) 
2 2
k 1 k 1
n n n
  a e  2 ak e ( xk  m)   ( xk  m)      (5)
2 2 T 2
k
k 1 k 1 k 1
Cont…
Recognizing that e partially
1, differentiating with
respect to and asetting
k the derivative to zero, we
obtain
ak  eT ( xk  m)    (6)
Geometrically, this result merely says that we obtain a
least square solution by projecting the vector
onto the line in
xkdirection of e that passes through
the sample mean.
Cont..
 This bring us to more interesting problem of finding the best direction e
for the line.
 The solution to this problem involves the so called scatter matrix S
defined by:
n
S 
k 1
( xk  m)( xk  m)T    (7)
ak
 It arises when we substitute the value of from equation (6) into
equation no (5).
Cont…
n n n
J1 (e)  
k 1
2
ak 2 k 1
2
ak   k 1
xk  m
2
n n
  
k 1
[eT ( xk  m) ]2  
k 1
xk  m
2
n n
  
k 1
eT ( xk  m)( xk  m)T e  
k 1
xk  m
2
n
 eT S e   k 1
xk  m
2
         (8)
Cont..
Clearly the vector e that minimizes J1 will also
maximize eT S e .
We will apply the method of Lagrange multiplier
to maximize eT S e subject to the constraint
e  1.
Cont..
Let  be the undetermined multiplier,
We differentiate
T T
With respect to e to obtain
U  e S e   (e e  1)      (9)
U
 2 Se  2e        (10)
e
Setting this Gradient Vector equal to zero, we see that e must be an eigenvector of the scatter matrix
Se  e    (11)
Cont...
 In Particular because eT S e   eT e   , it
follows that to maximize eT S e , we want to
select the eigenvector corresponding to the
largest eigenvalue of the scatter matrix.
 This result can be readily extended from one
dimensional projection to a d l dimensional
projection . In place of Eqn. (4) we write
dl
x  m a e
i 1
i i     (12)
l
where d  d.
Cont..
It is not difficult to show that criterion function
2
n  dl

 
Jd l   m 


k 1 

i 1
aki ei   xk


   (13)

e1,...., ed l
Is minimized where the vectors are
dl
the eigenvectors of the scatter matrix having the largest
eigen values.
This scatter matrix is real and symmetric, these eigen values
are otrthogonal.
Cont..
 They form a natural set of basis vector for
representing any vector x.
 The coefficients ai in Eqn (12) are the
component of x in that basis and are called
the principal components.
Example

PCA Dr. Pawan Kumar Tiwari

Uploaded by

Copyright:

Available Formats

PCA Dr. Pawan Kumar Tiwari

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PCA Dr. Pawan Kumar Tiwari

Uploaded by

Copyright:

Available Formats

PCA

Dr. Pawan Kumar Tiwari

Can u think of any method to represent all of the

and seek the value of x0 that minimizes J 0

You might also like