PCA Dr. Pawan Kumar Tiwari

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 19

PCA

By

Dr. Pawan Kumar Tiwari


Component Analysis
Approach to coping with the problem of excessive
dimensionality is to reduce the dimensionality by
combining features.
 Linear combination are particularly attractive
because they are simple to compute and analytically
tractable.
 Linear methods project the high dimensional data
into lower dimensional space.
 There are two classical approaches to find effective
linear transformation.
Cont…
PCA & MDA
Principal Component Analysis (PCA)-Seeks a
projection that best represent the data in least
square sense.
Multiple Discriminant Analysis (MDA)-Seeks a
projection that best separates the data in least
square sense.
Principal Component Analysis (PCA)-

Can u think of any method to represent all of the


vectors in a set of n dimensional samples
x1 ,..., xn by a single vector x0 ?
Cont..
Let us think we want to find a vector x0 such
that the sum of the squared distances
between x0 and various xk is as small as
possible
Let us define the Squared-Error Criterion
function J 0 ( x0 ) by
n


2
J 0 ( x0 )  x0  xk    (1)
k 1

and seek the value of x0 that minimizes J 0


Cont..
• What will be the value of x0 that will minimize
J 0 given in equation (1)?
Cont..
Solution to the problem is x0  m, where m is the
sample mean
n
1
m
n
x
k 1
k     ( 2)
Visualization of Solution
• Equationn(1) can also be written as
J 0 ( x0 )   ( x  m)  ( x  m)
k 1
0 k
2

n n n
  (x  m)  2 (x  m) (x  m)   (x  m)
k 1
0
2

k 1
0
T
k
k 1
k
2

n n n
 
k 1
2
( x0  m)  2( x0  m)T 
k 1
( xk  m )  
k 1
( xk  m )
2

         (3)
Cont..
 This sample mean is zero dimensional representation of the data
set.
 It is simple but it does not reveal any of the variability in the data.
 We can obtain a more interesting, one dimensional
representation by projecting the data onto a line running through
the sample mean.
 Let e be the unit vector in the direction of line. Then the equation
of line can be written as

x  m  ae      ( 4)
where the scalar a corresponds to the distance of any point x from
the mean m.
Cont..
If we represent xk by m  ak e we can
find an “optimal” set of coefficient ak by
minimizing the squared-error criterion
function.
n n
J1 (a1 ,..., an , e)   (m  ak e)  xk  ak e  ( xk  m) 
2 2

k 1 k 1
n n n
  a e  2 ak e ( xk  m)   ( xk  m)      (5)
2 2 T 2
k
k 1 k 1 k 1
Cont…
Recognizing that e partially
1, differentiating with
respect to and asetting
k the derivative to zero, we
obtain
ak  eT ( xk  m)    (6)
Geometrically, this result merely says that we obtain a
least square solution by projecting the vector
onto the line in
xkdirection of e that passes through
the sample mean.
Cont..
 This bring us to more interesting problem of finding the best direction e
for the line.
 The solution to this problem involves the so called scatter matrix S
defined by:

n
S 
k 1
( xk  m)( xk  m)T    (7)

ak
 It arises when we substitute the value of from equation (6) into
equation no (5).
Cont…

n n n
J1 (e)  
k 1
2
ak 2 k 1
2
ak   k 1
xk  m
2

n n
  
k 1
[eT ( xk  m) ]2  
k 1
xk  m
2

n n
  
k 1
eT ( xk  m)( xk  m)T e  
k 1
xk  m
2

n
 eT S e   k 1
xk  m
2
         (8)
Cont..
Clearly the vector e that minimizes J1 will also
maximize eT S e .
We will apply the method of Lagrange multiplier
to maximize eT S e subject to the constraint
e  1.
Cont..
Let  be the undetermined multiplier,
We differentiate

T T
With respect to e to obtain
U  e S e   (e e  1)      (9)
U
 2 Se  2e        (10)
e

Setting this Gradient Vector equal to zero, we see that e must be an eigenvector of the scatter matrix

Se  e    (11)
Cont...
 In Particular because eT S e   eT e   , it
follows that to maximize eT S e , we want to
select the eigenvector corresponding to the
largest eigenvalue of the scatter matrix.
 This result can be readily extended from one
dimensional projection to a d l dimensional
projection . In place of Eqn. (4) we write
dl
x  m a e
i 1
i i     (12)

l
where d  d.
Cont..
It is not difficult to show that criterion function
2
n  dl

 
Jd l   m 


k 1 

i 1
aki ei   xk


   (13)

e1,...., ed l
Is minimized where the vectors are
dl
the eigenvectors of the scatter matrix having the largest
eigen values.
This scatter matrix is real and symmetric, these eigen values
are otrthogonal.
Cont..
 They form a natural set of basis vector for
representing any vector x.
 The coefficients ai in Eqn (12) are the
component of x in that basis and are called
the principal components.
Example

You might also like