Finite Mixture Modelling Model Specification, Estimation & Application
Finite Mixture Modelling Model Specification, Estimation & Application
Finite Mixture Modelling Model Specification, Estimation & Application
H(y |x, ) =
k Fk (y |x, k )
k=1
with
K
X
k=1
Bettina Grun
k = 1
k > 0 k.
Types of applications:
Special cases:
model-based clustering
5
10
50
20
0
2
10
10
yn
40
30
10
Estimation
Maximum-Likelihood: Expectation-Maximization (EM) Algorithm (Dempster, Laird and Rubin, 1977)
50
40
30
yn
20
10
6
x
10
Missing data
EM algorithm: E-step
Given the current parameter estimates (i) replace the missing data znk
by the estimated a-posteriori probabilities
(i)
(i)
(i)
k fk (yn|xn, k )
K
X
(i)
(i)
u fk (yn|xn, u )
u=1
P
K
k=1 znk = 1 for all k = 1, . . . , K.
N
K X
X
(i)
k=1 n=1
k=1 n=1
EM algorithm: M-step
K X
N
X
The estimates for the prior class probabilities are given by:
(i+1)
N
1 X
(i)
z .
N n=1 nk
(i+1)
= arg max
k
k
N
X
(i)
n=1
n=1
(i+1)
= P
k
N
(i)
znk yn
(i)
nk
n=1 z
(i+1)
k
=
PN
(i)
(i+1)
(i+1) 0
nk (yn k
)(yn k
)
n=1 z
PN
(i)
nk
n=1 z
Estimation: EM algorithm
Advantages:
Bayesian estimation
(i)
(i1)
Advantages:
Relatively easy to implement
Different mixture models differ only in the parameter simulation
step.
Parameter simulation conditional on the classification is sometimes
already available.
Disadvantages:
Might fail to escape the attraction area of one mode not all posterior
modes are visited.
(i)
P(znk
= 1|yn, (i)) k fN (yn; k , k )
Bayes factors
Label switching
Initialization
Mixtures of regressions:
mixtures of generalized linear models
mixtures of generalized linear mixed models
random
cluster analysis results: e.g. hierarchical clustering, k-means
Software in R
Software: FlexMix
Model-based clustering:
mclust (Fraley and Raftery, 2002) for Gaussian mixtures:
specify different models depending on the structure of the
variance-covariance matrices (volume, shape, orientation)
k = k Dk diag(ak )Dk0
initialize EM algorithm with the solution from an agglomerative
hierarchical clustering algorithm
Clusterwise regression:
flexmix (Leisch, 2004)
See also CRAN Task View Cluster Analysis & Finite Mixture Models.
The function flexmix() provides the E-step and all data handling.
The M-step is supplied by the user similar to glm() families.
Multiple independent responses from different families
Currently bindings to several GLM families exist (Gaussian, Poisson,
Gamma, Binomial)
Weighted, hard (CEM) and random (SEM) classification
Components with prior probability below a user-specified threshold are
automatically removed during iteration
FlexMix Design
Example: Clustering
Primary goal is extensibility: ideal for trying out new mixture models.
No replacement of specialized mixtures like mclust(), but complement.
Usage of S4 classes and methods
Formula-based interface
Multivariate responses:
combination of univariate families: assumption of independence
(given x), each response may have its own model formula, i.e.,
a different set of regressors
multivariate families: if family handles multivariate response directly,
then arbitrary multivariate response distributions are possible
Example: Clustering
Example: Clustering
>
+
+
1
2
3
4
5
600
1000
700
500
400
100
0
500
1000
insulin
1500
Call:
stepFlexmix(diabetes_data ~ 1, model = FLXMCmvnorm(diag = FALSE),
k = 1:5, nrep = 10)
300
200
sspg
500
500
sspg
> library("flexmix")
> data("diabetes", package = "mclust")
> diabetes_data <- as.matrix(diabetes[, 2:4])
100
200
glucose
300
1
2
3
4
5
400
iter converged k k0
logLik
AIC
BIC
ICL
2
TRUE 1 1 -2545.833 5109.666 5136.456 5136.456
12
TRUE 2 2 -2354.674 4747.347 4803.905 4811.644
24
TRUE 3 3 -2303.557 4665.113 4751.439 4770.353
36
TRUE 4 4 -2287.605 4653.210 4769.302 4793.502
60
TRUE 5 5 -2274.655 4647.309 4793.169 4822.905
500
> plot(mix)
Example: Clustering
Example: Clustering
AIC
BIC
ICL
5000
5100
4800
4900
4700
number of components
Example: Clustering
Example: Clustering
700
sspg
300
200
400
500
600
1000
500
$cov
glucose
insulin
sspg
glucose 58.21456
80.1404
16.8295
insulin 80.14039 2154.9810 347.6972
sspg
16.82950 347.6972 2484.1538
> plot(mix_best, mark = 2)
100
500
sspg
500
1000
insulin
1500
Cluster sizes:
1 2 3
82 28 35
convergence after 24 iterations
> summary(mix_best)
Call:
stepFlexmix(diabetes_data ~ 1, model = FLXMCmvnorm(diag = FALSE),
k = 3, nrep = 10)
100
200
glucose
300
400
500
Example: Clustering
Example: Regression
0.2
Comp. 1
0.4
0.6
0.8
1.0
Comp. 2
Comp. 3
25
20
15
10
0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
50
100
150
200
250
Number of aphids
Example: Regression
Example: Regression
> posterior(mix)[1:4,]
[,1]
[,2]
[1,] 0.9949732 0.005026814
[2,] 0.9949769 0.005023128
[3,] 0.2098020 0.790198026
[4,] 0.2050383 0.794961704
> predict(mix, newdata = data.frame(n.aphids = c(0, 300)))
$Comp.1
[,1]
1 3.458813
2 20.047842
$Comp.2
[,1]
1 0.8679776
2 1.5740946
300
Example: Regression
Example: Regression
25
20
> refit(mix)
Call:
refit(mix)
Number of components: 2
15
$Comp.1
10
$Comp.2
50
100
150
200
250
300
Number of aphids
Example: Regression
Comp. 1
Applications
Comp. 2
n.aphids
Monographs
References
D. Bohning.Computer
Assisted Analysis of Mixtures and Applications:
Meta-Analysis, Disease Mapping, and Others. Chapman & Hall/CRC,
London, 1999.
S. Fruhwirth-Schnatter.
Finite Mixture and Markov Switching Models.