7.pca Mda
7.pca Mda
7.pca Mda
where, 1 , 2 ,..., n
T n
X X
1 N (k ) (k ) T
N k 1
11 12 ... 1n
21 22 ... 2n n n
... ... ...
n1 n 2 ... n n
where, ij covariance between i th component and jth
component of sample feature vectors
= xi i x (jk ) j
1 N (k )
N k 1
and ji ij
ii variance between i th component of the feature vector
2
xi( k ) i
N
1
=
N k 1
2. Dimension Reduction Procedure
using KL-Transform:
Step #1: Computation of Mean Vector
n
nn
and Covariance Matrix for a given
set of feature vector X
n N
(k )
.
k 1
where, 1 , 2 ,..., n
T n
X X
1 N (k ) (k ) T
N k 1
11 12 ... 1n
21 22 ... 2n n n
... ... ...
n1 n 2 ... n n
where, ij covariance between i th component and jth
component of sample feature vectors
= xi i x (jk ) j
1 N (k )
N k 1
and ji ij
ii variance between i th component of the feature vector
2
= xi i
1 N (k )
N k 1
Step #2: Determination of the eigenvalues of
nn
the Covariance Matrix in
descending order.
n n
The eigenvalues of the covariance matrix
arranged in descending order are the following :
1 2 3 ... n
where eˆi i eˆi for all i 1, 2,..., n
and eˆi unit eigenvector of the covariance matrix
n n
corresponding to eigenvalue i .
unit eigenvectors
eˆ1 , eˆ2 , eˆ3 , ..., eˆ n
corresponding to descending sequence of
eigenvalues
1 , 2 , 3 ,..., n where, 1 2 3 ... n .
Step #4: In order to reduce the dimension of
(k )
each feature vector X from n-dimension
to l-dimension where n>>l, the vector
X (k )
is projected on the unit
eigenvectors
eˆ1 , eˆ2 , eˆ3 , ..., eˆl .
(k )
( X ) e3 ˆ
T
Y (k ) ...
l
...
...
( X ( k ) )T eˆ
l
Theorem:
The best possible one-dimensional
representations of the set of n-dimensional
feature vectors can be obtained by projecting
the difference of the given vector and the
mean vector on the unit principal eigenvector
(the unit eigen vector corresponding to the
largest eigen value) of the covariance matrix
( ) of the original feature vectors.
Proof:
Y (k )
t eˆ (k ) n
N 2 N 2
Total Error E X (k )
Y (k )
X (k )
( t eˆ ) (k )
k 1 k 1
N 2
( X ( k ) ) t ( k ) eˆ
k 1
N 2 N 2 N
X (k )
t (k ) 2
eˆ 2 t ( k ) eˆ T ( X ( k ) )
k 1 k 1 k 1
N 2 N N
X (k )
t (k ) 2
2 t ( k ) eˆ T ( X ( k ) )
k 1 k 1 k 1
...(1)
eˆ is an unit vector i.e. eˆ 1
E N 2 N N
(k ) X t 2 t eˆ ( X )
(k ) (k ) 2 (k ) T (k )
t (k )
t k 1 k 1 k 1
2t ( k ) 2 eˆ T ( X ( k ) )
Hence, t ( k ) eˆ T ( X ( k ) )
( X ( k ) )T eˆ...(2)
E X (k ) Y (k )
k 1
N 2 N
X (k )
t (k ) 2
...(3)
k 1 k 1
k 1 k 1
N 2 N
X (k )
eˆ T ( X ( k ) )( X ( k ) )T eˆ
k 1 k 1
N
N 2
X (k )
eˆ ( X ( k ) )( X ( k ) )T eˆ
T
k 1 k 1
N 2
X (k ) eˆ T (N ) eˆ
k 1
X (k )
N( eˆ T eˆ ) ...(4)
k 1
X (k )
0
greater than or equal to zero i.e., , k 1
eˆ eˆ max
T
Hence, max
We have to select the maximum eigenvalue
(max )
of the covariance matrix .
Thus
ê will be the eigenvector of the
covariance matrix corresponding to the
(max )
maximum eigenvalue .
will be as follows:
t (k) eˆ T ( X ( k ) ) ( X ( k ) )T eˆ
3. Multiple Discriminant Analysis
(MDA):
Motivation
Principal Component Analysis (PCA)
does not take class label information
into account. PCA may remove the
components which are discriminatory.
Example
If PCA is applied to set of feature vectors
representing two types of characters ‘O’
and ‘Q’ then the tail portion of ‘Q’ will be
removed, which the discriminatory
component between ‘O’ and ‘Q’.
Hence, for classification-based
applications discriminatory component
must be preserved after dimensionality
reduction.
N
Training Set T X (i ) , y (i )
i 1
1
mi centre of i th class
Ni
X C
X n
i
k
1
m grand mean
N
i 1
Nm i i
n
4. Dimension Reduction Procedure
using Multiple Discriminant Analysis
(MDA):
X
N
where,
X ( i ) feature vector of i th sample n
1
mi centre of i th class
Ni
X C
X n
i
k
1
m grand mean
N
i 1
Nm
i i
n
i 1 X Ci
k
and S B N i m i m m i m
T
i 1
Step #3: Determination of the eigenvalues of
1 nn
the discrimination matrix W S B
S in the
descending order.
Y (k ) ... l
...
...
(k )
( X m) eˆl
T
Theorem:
The best direction corresponding to
discrimination is along the unit eigenvector
Ŵ of the discrimination matrix S W
1
SB
corresponding to its highest eigen value
max .
Proof:
Feature vectors X (i ) n i 1, 2,3,..., N are now
projected to some direction corresponding to
the unit vector Wˆ to determine the most
n
1 k
m grand mean in the transformed domain Ni mi
N i 1
1 1
T 1
mi N Y ˆ ˆ
W X W
T
X Wˆ T mi
Ni YCi i X Ci Ni X Ci
ˆT
1 k 1 k 1 k
m Ni mi N i W mi W N i mi W m
ˆ T ˆ T
N i 1 N i 1 N i 1
k
SW within cluster scattering X mi X mi
T
i 1 X Ci
k
S B between cluster scattering Ni mi m mi m
T
i 1
i 1 X Ci
i 1 Y Ci
k
T
Wˆ T X Wˆ T mi Wˆ T X Wˆ T mi
i 1 X Ci
k T
Wˆ X mi X mi Wˆ
T
i 1 X Ci
Wˆ T
S Wˆ
W
i 1
i 1
k
Ni Wˆ T mi Wˆ T m Wˆ T mi Wˆ T m
T
i 1
k
T
Wˆ N i mi m mi m Wˆ
T
i 1
Wˆ T S Wˆ B
S Wˆ T
S Wˆ
J (Wˆ ) B T B ...(1)
SW Wˆ SW Wˆ
Hence, we have to maximise
Wˆ T S B Wˆ
Wˆ T
ˆ ˆ 0 Null vector
W SW W
Wˆ T SW Wˆ Wˆ Wˆ T S B Wˆ Wˆ T S B Wˆ Wˆ Wˆ T SW Wˆ
0
2
Wˆ SW Wˆ
T
Wˆ T SW Wˆ 2S B Wˆ Wˆ T S B Wˆ 2SW Wˆ 0
Wˆ S Wˆ S Wˆ Wˆ S Wˆ S
T
W B
T
B W Wˆ
S Wˆ
Wˆ S Wˆ T
S Wˆ
B
B
Wˆ S Wˆ T
W
W
S B Wˆ SW Wˆ ...(2)
where
Wˆ T S B Wˆ some scaler
Wˆ T
SW Wˆ
Now J (Wˆ )
Wˆ T
S B Wˆ
Wˆ T
SW Wˆ