Deep Learning Basics Lecture 7 Factor Analysis
Deep Learning Basics Lecture 7 Factor Analysis
Deep Learning Basics Lecture 7 Factor Analysis
max (𝑣 𝑇 𝑥𝑖 )2 , 𝑠. 𝑡. 𝑣 𝑇 𝑣 = 1
𝑣
𝑖=1
equivalent to
max 𝑣 𝑇 𝑋𝑋 𝑇 𝑣 , 𝑠. 𝑡. 𝑣 𝑇 𝑣 = 1
𝑣
where the columns of 𝑋 are the data points
Math formulation
• First PC:
max 𝑣 𝑇 𝑋𝑋 𝑇 𝑣 , 𝑠. 𝑡. 𝑣 𝑇 𝑣 = 1
𝑣
where the columns of 𝑋 are the data points
• 𝑋𝑋 𝑇 : covariance matrix
• 𝑣 : eigen-vector of the covariance matrix
• First PC: first eigen-vector of the covariance matrix
• Top 𝑘 PC’s: similar argument shows they are the top 𝑘 eigen-vectors
Computation: Eigen-decomposition
• Top 𝑘 PC’s: the top 𝑘 eigen-vectors 𝑋𝑋 𝑇 𝑈 = Λ𝑈
where Λ is a diagonal matrix
• 𝑈 are the left singular vectors of 𝑋
ℎ
𝑣
𝑥𝑖
A latent variable view of PCA
• Consider top 𝑘 PC’s 𝑈
• Let ℎ𝑖 = 𝑈 𝑇 𝑥𝑖
• Data point viewed as 𝑥𝑖 = 𝑈ℎ𝑖 + 𝑛𝑜𝑖𝑠𝑒
𝑥
A latent variable view of PCA
PCA structure assumption: ℎ
• Consider top 𝑘 PC’s 𝑈 low dimension. What about
other assumptions?
• Let ℎ𝑖 = 𝑈 𝑇 𝑥𝑖
• Data point viewed as 𝑥𝑖 = 𝑈ℎ𝑖 + 𝑛𝑜𝑖𝑠𝑒
𝑥
Sparse coding
• Structure assumption: ℎ is sparse, i.e., ℎ 0 is small
• Dimension of ℎ can be large
𝑥
Sparse coding
• Latent variable probabilistic model view:
1
𝑝 𝑥 ℎ = 𝑊ℎ + 𝑁 0, 𝐼 , ℎ is sparse,
𝛽
𝜆 𝜆
• E.g., from Laplacian prior: 𝑝 ℎ = exp(− ℎ 1 )
2 2
𝑥
Sparse coding
• Suppose 𝑊 is known. MLE on ℎ is
ℎ∗ = arg max log 𝑝 ℎ 𝑥
ℎ
2
ℎ∗ = arg min 𝜆 ℎ 1
+ 𝛽 𝑥 − 𝑊ℎ 2
ℎ
• Memory networks