L11.2 Prob Models em
L11.2 Prob Models em
L11.2 Prob Models em
Khoat Than
School of Information and Communication Technology
Hanoi University of Science and Technology
2022
2
Content
¡ Practical advice
3
Difficult situations
¡ No closed-form solution for the learning/inference problem?
(không tìm được ngay công thức nghiệm)
¨ The examples before are easy cases, as we can find solutions in a closed
form by using gradient.
¨ Many models (e.g., GMM) do not admit a closed-form solution
¡ No explicit expression of the density/mass function?
(không có công thức tường minh để tính toán)
Expectation
maximization
The EM algorithm
5
GMM revisit
q Consider learning GMM, with K Gaussian distributions, from the training
data D = {x1, x2, …, xM}.
q The density function is 𝑝(𝒙|𝝁, 𝜮, 𝝓) = ∑$
!"# 𝜙! 𝒩 𝒙 𝝁! , 𝜮! )
¡ At iteration 𝑡:
(9) (9) (9)
¨ E step: compute 𝑇#. = 𝑃 𝑧 = 𝑘 𝒙. , 𝜽(9) = 𝜙# 𝒩 𝒙 𝝁# , 𝜮# )/𝐶
for any index 𝑘 = 1, 𝐾, 𝑖 = 1, 𝑀
¨ M step: update for any k,
(9:!) 𝑎# 0
𝜙# = , where 𝑎# = : 𝑇#. ;
𝑀 ./!
(9:!) 1 0
(9:!) 1 0
(9:!) (9:!) ,
𝝁# = : 𝑇#. 𝒙. ; 𝜮# = : 𝑇#. 𝒙. − 𝝁# 𝒙. − 𝝁#
𝑎# ./! 𝑎# ./!
https://en.wikipedia.org/wiki/Expectation-maximization_algorithm
12
GMM: comparison with K-means
q K-means: q GMM clustering
¨ Step 1: hard assignment ¨ Soft assignment of data to the clusters
https://en.wikipedia.org/wiki/Expectation-maximization_algorithm
13
General models
¡ We can make the EM algorithm in more general cases.
¡ Consider a model 𝐵(𝒙, 𝒛; 𝜽) with observed variable x, hidden variable z,
and parameterized by 𝜽
(mô hình có một biến x quan sát được, biến ẩn z, và tham số 𝜽)
¨ x depends on z and 𝛉, while z may depend on 𝛉
¨ Mixture models: each observed data point has a corresponding latent variable,
specifying the mixture component which generated the data point
¡ The learning task is to find a specific model, from the model family
parameterized by 𝜽, that maximizes the log-likelihood of training data D:
𝜽∗ = argmax𝜽 log 𝑃(𝑫|𝜽)
¡ We assume D consists of i.i.d samples of x, the the log-likelihood function
can be expressed analytically, 𝐿𝐵 𝜽 = ∑𝒙∈𝑫 𝔼'|𝒙,𝜽 log 𝑃 𝒙, 𝑧 𝜽 can be
computed easily (hàm log-likelihood có thể viết một cách tường minh)
¨ Since there is a latent variable, MLE may not have a close form solution
14
The Expectation Maximization algorithm
¡ The Expectation maximization (EM) algorithm was introduced in 1977 by
Arthur Dempster, Nan Laird, and Donald Rubin.
¡ The EM algorithm maximizes the lower bound of the log-likelihood
L 𝜽; 𝑫 = log 𝑃 𝑫 𝜽 ≥ 𝐿𝐵 𝜽 = / 𝔼'|𝒙,𝜽 log 𝑃 𝒙, 𝑧 𝜽
𝒙∈𝑫
¡ Initialization: 𝜽(.) , 𝑡 = 0
¡ At iteration 𝑡:
¨ E step: compute the expectation 𝑄 𝜽|𝜽(9) = 𝐿𝐵 𝜽(9-!)
(tính hàm kỳ vọng Q khi cố định giá trị 𝜽(() đã biết ở bước trước)
𝑓(𝒙; 𝜽, 𝝓) = : 𝜙# 𝑓# 𝒙 𝜽# )
#/!
¡ Amortized inference
¨ Neural networks for doing Bayesian inference
¨ Learning to do inference
20
Reference
¡ Blei, David M., Alp Kucukelbir, and Jon D. McAuliffe. "Variational inference: A review for
statisticians." Journal of the American Statistical Association 112, no. 518 (2017): 859-877.
¡ Blundell, Charles, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. "Weight
Uncertainty in Neural Network." In International Conference on Machine Learning (ICML), pp.
1613-1622. 2015.
¡ Dempster, A.P.; Laird, N.M.; Rubin, D.B. (1977). "Maximum Likelihood from Incomplete Data via
the EM Algorithm". Journal of the Royal Statistical Society, Series B. 39 (1): 1-38.
¡ Ghahramani, Zoubin. "Probabilistic machine learning and artificial intelligence." Nature 521,
no. 7553 (2015): 452-459.
¡ Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes.” In International
Conference on Learning Representations (ICLR), 2014.
¡ Jordan, Michael I., and Tom M. Mitchell. "Machine learning: Trends, perspectives, and
prospects." Science 349, no. 6245 (2015): 255-260.
¡ Tosh, Christopher, and Sanjoy Dasgupta. “The Relative Complexity of Maximum Likelihood
Estimation, MAP Estimation, and Sampling.” In COLT, PMLR 99:2993-3035, 2019.
¡ Sontag, David, and Daniel Roy, “Complexity of inference in latent dirichlet allocation” in:
Advances in Neural Information Processing System, 2011.