Intro To PCA
Intro To PCA
Intro To PCA
Adapted from G. Piatetsky-Shapiro, Biologically Inspired Intelligent Systems (Lecture 7) and R. Gutierrez-Osunas Lecture
Attribute Construction
Better to have a fair modeling method and good variables, than to have the best modeling method and poor variables Examples:
People are eligible for pension withdrawal at age 59 . Create it as a separate Boolean variable! Household income as sum of spouses incomes in loan underwritting
Advanced methods exists for automatically examining variable combinations, but it is very computationally expensive!
10
Variance
A measure of the spread of the data in a data set
X
s !
2 i! 1
X
n 1
Covariance
Variance measure of the deviation from the mean for points in one dimension, e.g., heights Covariance a measure of how much each of the dimensions varies from the mean with respect to each other. Covariance is measured between 2 dimensions to see if there is a relationship between the 2 dimensions, e.g., number of hours studied & grade obtained. The covariance between one dimension and itself is the variance
12
Covariance
X
var( X ) !
i! 1 n n i
X
X
n 1
X
cov( X ,Y ) !
i! 1
X Yi Y
n 1
So, if you had a 3-dimensional data set (x,y,z), then you could measure the covariance between the x and y dimensions, the y and z dimensions, and the x and z dimensions.
13
Covariance
What is the interpretation of covariance calculations? Say you have a 2-dimensional data set
X: number of hours studied for a subject Y: marks obtained in that subject
And assume the covariance value (between X and Y) is: 104.53 What does this value mean?
14
Covariance
Exact value is not as important as its sign. A positive value of covariance indicates that both dimensions increase or decrease together, e.g., as the number of hours studied increases, the grades in that subject also increase. A negative value indicates while one increases the other decreases, or vice-versa, e.g., active social life at BYU vs. performance in CS Dept. If covariance is zero: the two dimensions are independent of each other, e.g., heights of students vs. grades obtained in a subject. 15
Covariance
Why bother with calculating (expensive) covariance when we could just plot the 2 values to see their relationship?
Covariance calculations are used to find relationships between dimensions in high dimensional data sets (usually greater than 3) where visualization is difficult.
16
Covariance Matrix
Representing covariance among dimensions as a matrix, e.g., for 3 dimensions:
cov( X, X) cov( X,Y ) cov( X,Z) C ! cov(Y , X) cov(Y ,Y ) cov(Y ,Z) cov(Z, X) cov(Z,Y ) cov(Z,Z)
Properties:
Diagonal: variances of the variables cov(X,Y)=cov(Y,X), hence matrix is symmetrical about the diagonal (upper triangular) n-dimensional data will result in nxn covariance matrix 17
Transformation Matrices
Consider the following:
3 3 3 12 2 v ! ! 4 v 2 2 2 1 8
The square (transformation) matrix scales (3,2) Now assume we take a multiple of (3,2) 3 6 2 v ! 2 4
2 2
3 v 1
6 ! 4
6 24 ! 4 v 4 16
18
Transformation Matrices
Scale vector (3,2) by a value 2 to get (6,4) Multiply by the square transformation matrix And we see that the result is still scaled by 4. WHY? A vector consists of both length and direction. Scaling a vector only changes its length and not its direction. This is an important observation in the transformation of matrices leading to formation of eigenvectors and eigenvalues. Irrespective of how much we scale (3,2) by, the solution (under the given transformation matrix) is always a multiple of 4.
19
Eigenvalue Problem
The eigenvalue problem is any problem having the following form: A.v= .v A: n x n matrix v: n x 1 non-zero vector : scalar Any value of for which this equation has a solution is called the eigenvalue of A and the vector v which corresponds to this value is called the eigenvector of A.
20
Eigenvalue Problem
Going back to our example:
3 3 3 12 2 v ! ! 4 v 2 2 2 1 8
A . v =
. v
Therefore, (3,2) is an eigenvector of the square matrix A and 4 is an eigenvalue of A The question is: Given matrix A, how can we calculate the eigenvector and eigenvalues for A?
21
0 1 A ! 2 3 Then:
0 1 0 0 1 1 P A P .I ! ! P 0 0 2 3 1 2 3
0 P
P 1 ! ! v 3 P 2 v 1 ! P2 3P 2 P 2 3 P
23
A P1 .I
.v1 ! 0
1 2
Therefore the first eigenvector is any column vector in which the two elements have equal magnitude and opposite sign.
24
1 v1 ! k1 1
1 v 2 ! k2 2
PCA
Principal components analysis (PCA) is a technique that can be used to simplify a dataset It is a linear transformation that chooses a new coordinate system for the data set such that
The greatest variance by any projection of the data set comes to lie on the first axis (then called the first principal component) The second greatest variance on the second axis Etc.
PCA can be used for reducing dimensionality by eliminating the later principal components.
27
PCA
By finding the eigenvalues and eigenvectors of the covariance matrix, we find that the eigenvectors with the largest eigenvalues correspond to the dimensions that have the strongest correlation in the dataset. These are the principal components. PCA is a useful statistical technique that has found application in:
Fields such as face recognition and image compression Finding patterns in data of high dimension.
28
29
http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf
n dimensions in your data calculate n eigenvectors and eigenvalues choose only the first p eigenvectors final data set has only p dimensions.
35
P P
i! 1 i! 1 n
!
i
P1 P 2 K P p P1 P 2 K P p K P n
If the dimensions are highly correlated, there will be a small number of eigenvectors with large eigenvalues and p will be much smaller than n. If the dimensions are not correlated, p will be as large as n and PCA does not help. 36
(take the eigenvectors to keep from the ordered list of eigenvectors, and form a matrix with these eigenvectors in the columns)
We can either form a feature vector with both of the eigenvectors: 0.677873399 0.735178656
0.735178656 0.677873399
or, we can choose to leave out the smaller, less significant component and only have a single column:
0.677873399 0.735178656
37
RowFeatureVector is the matrix with the eigenvectors in the columns transposed so that the eigenvectors are now in the rows, with the most significant eigenvector at the top. RowZeroMeanData is the mean-adjusted data transposed, i.e., the data items are in each column, with each row holding a separate dimension.
38
40
41
Then:
RowZeroMeanData = RowFeatureVector-1 x FinalData
And thus:
RowOriginalData = (RowFeatureVector-1 x FinalData) + OriginalMean
If we use unit eigenvectors, the inverse is the same as the transpose (hence, easier).
42
43
44
45
47
References
PCA tutorial: http://kybele.psych.cornell.edu/~edelman/Ps ych-465-Spring-2003/PCA-tutorial.pdf Wikipedia: http://en.wikipedia.org/wiki/Principal_comp onent_analysis http://en.wikipedia.org/wiki/Eigenface
48