Dimensionality Reduction
Dimensionality Reduction
Dimensionality Reduction
Fereshteh Sadeghi
CSEP 546
Motivation
● Clustering
○ One way to summarize a complex real-valued data point with a single categorical variable
● Dimensionality reduction
○ Another way to simplify complex high-dimensional data
○ Summarize data with a lower dimensional real valued vector
Motivation
● Clustering
○ One way to summarize a complex real-valued data point with a single categorical variable
● Dimensionality reduction
○ Another way to simplify complex high-dimensional data
○ Summarize data with a lower dimentional real valued vector
2D to 1D
(cm)
Andrew
Data Compression
Reduce data from
(inches)
2D to 1D
(cm)
Andrew
Data Compression
Reduce data from 3D to 2D
Andrew
Principal Component Analysis (PCA) problem formulation
Andrew
Covariance
● Variance and Covariance:
○ Measure of the “spread” of a set of points around their center of mass(mean)
● Variance:
○ Measure of the deviation from the mean for points in one dimension
● Covariance:
○ Measure of how much each of the dimensions vary from the mean with respect to each other
The Sample
mean
Eigenvector and Eigenvalue
Ax = λx
A: Square Matirx
λ: Eigenvector or characteristic vector
X: Eigenvalue or characteristic value
• The zero vector can not be an
eigenvector
• The value zero can be eigenvalue
Eigenvector and Eigenvalue
Ax = λx
A: Square Matirx
λ: Eigenvector or characteristic vector
X: Eigenvalue or characteristic value
Example
Eigenvector and Eigenvalue
Ax - λx = 0
Ax =
(A – λI)x = 0
λx
If we define a new matrix B:
B=A–
λI
Bx = 0 BUT! an
If B has an x = B-10 = 0 eigenvector cannot
inverse:
be zero!!
x will be an eigenvector of A if and only if B does
not have an inverse, or equivalently det(B)=0 :
det(A – λI) = 0
Eigenvector and Eigenvalue
Example 1: Find the eigenvalues of
λ = 2 is an eigenvector of multiplicity 3.
Principal Component Analysis
Input
:
Set of basis
vectors:
Summarize a D dimensional vector X with K dimensional
feature vector h(x)
Principal Component Analysis
• The top three principal components of SIFT descriptors from a set of images are computed
• Map these principal components to the principal components of the RGB space
• pixels with similar colors share similar structures
Application: Image compression
Original
Image
http://en.wikipedia.org/wiki/Discrete_cosine_transform
Dimensionality reduction
● PCA (Principal Component Analysis):
○ Find projection that maximize the variance
● ICA (Independent Component Analysis):
○ Very similar to PCA except that it assumes non-Guassian features
● Multidimensional Scaling:
○ Find projection that best preserves inter-point distances
● LDA(Linear Discriminant Analysis):
○ Maximizing the component axes for class-separation
●…
●…