Does anybody know how to find the confidence intervals for estimated parameters of a mixture of Gaussians by using EM algorithm?
-
1$\begingroup$ Could you be somewhat more clear? Why do you think EM is interesting in this case? What have you tried yourself? Also, can you describe your data clearly and specify which parameters you actually want estimated? This may help us understand exactly what you expect. $\endgroup$– Nick SabbeCommented May 2, 2013 at 12:59
-
1$\begingroup$ @NickSabbe: Hi Nick, sorry for my unclear question, I'm not good at math... :). My data come from a mixture of normal distributions (the number of components is known), I have used EM algorithm to estimate the proportion, mean and variance of each component. Suppose that the dataset is big enough, what I want to know is how to find the confident interval for each estimated proportion, mean and variance. $\endgroup$– An MaiCommented May 2, 2013 at 13:20
-
$\begingroup$ @AnMai I had a similiar problem as you have, check my posts: stats.stackexchange.com/questions/54736/… and stats.stackexchange.com/questions/54726/… The basic point is, that for confidence intervals you will need the standard errors. You can get them via boot.se but the problem is (see my posts) that this is not consistent, you get different values every time you run the code. The better solution is to use bootstrap ci: projecteuclid.org/… $\endgroup$– Stat TisticianCommented May 2, 2013 at 14:48
-
$\begingroup$ see also this paper: citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.203 $\endgroup$– Stat TisticianCommented May 2, 2013 at 14:53
-
$\begingroup$ @Stat Tistician I also encounted the same problem. Thanks for your sharing, I have downloaded all the documents you list above. However, I am bad at math, so could you just tell me whether I can use bootstrap or other tools to calculate the confidence intervals of the normal distribution component which is obtained by EM algorithm??? $\endgroup$– user26940Commented Jun 16, 2013 at 6:30
2 Answers
This is a natural question but as such it has no answer: EM is an optimisation algorithm, not a statistical inference principle. As such, it returns a maximum likelihood point estimate of the parameter in the best case (or a local mode in the worse cases). To find confidence intervals on the parameters, you need to involve other statistical principles, like bootstrap or Bayesian inference. Note that in the case of mixtures standard approximations fail because of the lack of identifiability of the parameters and the degeneracy at the boundaries of the parameter space.
If this were me, I would formulate this as a Bayesian problem, like Xi'an said, so that confidence (well, credible) intervals fall out naturally. Since the mixture component means are random variables, the posterior distribution will tell you everything you need to know about your estimated parameters, beyond just point estimates and confidence intervals. While it's true that standard Bayesian methods like MCMC will perform poorly on mixture model data due to lack of identifiability and a highly multimodal posterior, you can mitigate this computationally (like by using Potentials in PyMC3) or by switching to Variational Inference, which is not only better suited to this problem but will also give you back a distribution for each parameter.