Abdi Valentin

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Multiple Correspondence Analysis

Herv Abdi1

&

Dominique Valentin

1 Overview
Multiple correspondence analysis (MCA) is an extension of correspondence analysis (CA) which allows one to analyze the pattern of
relationships of several categorical dependent variables. As such,
it can also be seen as a generalization of principal component analysis when the variables to be analyzed are categorical instead of
quantitative. Because MCA has been (re)discovered many times,
equivalent methods are known under several different names such
as optimal scaling, optimal or appropriate scoring, dual scaling,
homogeneity analysis, scalogram analysis, and quantification method.
Technically MCA is obtained by using a standard correspondence analysis on an indicator matrix (i.e., a matrix whose entries
are 0 or 1). The percentages of explained variance need to be corrected, and the correspondence analysis interpretation of interpoint distances needs to be adapted.
1
In: Neil Salkind (Ed.) (2007). Encyclopedia of Measurement and Statistics.
Thousand Oaks (CA): Sage.
Address correspondence to: Herv Abdi
Program in Cognition and Neurosciences, MS: Gr.4.1,
The University of Texas at Dallas,
Richardson, TX 750830688, USA
E-mail: [email protected] http://www.utd.edu/herve

H. Abdi & D. Valentin: Multiple Correspondence Analysis

2 When to use it
M CA is used to analyze a set of observations described by a set of
nominal variables. Each nominal variable comprises several levels, and each of these levels is coded as a binary variable. For example gender, (F vs. M) is one nominal variable with two levels.
The pattern for a male respondent will be 0 1 and 1 0 for a female.
The complete data table is composed of binary columns with one
and only one column taking the value 1 per nominal variable.
M CA can also accommodate quantitative variables by recoding them as bins. For example, a score with a range of 5 to +5
could be recoded as a nominal variable with three levels: less than
0, equal to 0, or more than 0. With this schema, a value of 3 will be
expressed by the pattern 0 0 1. The coding schema of MCA implies
that each row has the same total, which for CA implies that each
row has the same mass.

3 An example
We illustrate the method with an example from wine testing. Suppose that we want to evaluate the effect of the oak species on barrelaged red Burgundy wines. First, we aged wine coming from the
same harvest of Pinot Noir in six different barrels made with two
types of oak. Wines 1, 5, and 6 were aged with the first type of
oak, whereas wines 2, 3, and 4 were aged with the second. Next,
we asked each of three wine experts to choose from two to five
variables to describe the wines. For each wine and for each variable, the expert was asked to rate the intensity. The answer given
by the expert was coded either as a binary answer (i.e., fruity vs.
non-fruity) or as a ternary answer (i.e., no vanilla, a bit of vanilla,
clear smell of vanilla). Each binary answer is represented by 2 binary columns (e.g., the answer fruity is represented by the pattern 1 0 and non-fruity is 0 1). A ternary answer is represented
by 3 binary columns (i.e., the answer some vanilla is represented
by the pattern 0 1 0). The results are presented in Table 1 (the same
data are used to illustrate STATIS and Multiple factor analysis, see
the respective entries). The goal of the analysis is twofold. First
2

H. Abdi & D. Valentin: Multiple Correspondence Analysis

we want to obtain a typology of the wines and second we want to


know if there is an agreement between the scales used by the experts. We will use the type of type of oak as a supplementary (or
illustrative) variable to be projected on the analysis after the fact.
Also after the testing of the six wines was performed, an unknown
bottle of Pinot Noir was found and tested by the wine testers. This
wine will be used as a supplementary observation. For this wine,
when an expert was not sure of how to use a descriptor, a pattern
of response such .5 .5 was used to represent the answer.

4 Notations
There are K nominal variables, each nominal variable has J k levels and the sum of the J k is equal to J . There are I observations.
The I J indicator matrix is denoted X. Performing CA on the indicator matrix will provide two sets of factor scores: one for the
rows and one for the columns. These factor scores are, in general scaled such that their variance is equal to their corresponding
eigenvalue (some versions of CA compute row factor scores normalized to unity).
The grand total of the table is noted N , and the first step of
the analysis is to compute the probability matrix Z = N 1 X. We
denote r the vector of the row totals of Z, (i.e., r = Z1, with 1 being
a conformable vector of 1s) c the vector of the columns totals, and
Dc = diag {c}, Dr = diag {r}. The factor scores are obtained from the
following singular value decomposition:

1
1
(1)
Dr 2 Z rcT Dc 2 = PQT
( is the diagonal matrix of the singular values, and = 2 is the
matrix of the eigenvalues). The row and (respectively) columns
factor scores are obtained as
1

F = Dr 2 P

and

G = Dc 2 Q .

(2)

The squared (2 ) distance from the rows and columns to their respective barycenter are obtained as
n
o
n
o
dr = diag FFT
and
dc = diag GGT .
(3)
3

4
W?

Oak
Wine Type fruity
W1
1
1 0
W2
2
0 1
W3
2
0 1
W4
2
0 1
W5
1
1 0
W6
1
1 0
0 1 0

woody
0 0 1
0 1 0
1 0 0
1 0 0
0 0 1
0 1 0

Expert 1

.5 .5

coffee
0 1
1 0
1 0
1 0
0 1
0 1
1 0

red
fruit
1 0
0 1
0 1
0 1
1 0
1 0
1

roasted
0
1
1
0
1
0
1
0
0
1
0
1
0 1 0

vanillin
0 0 1
0 1 0
1 0 0
1 0 0
0 0 1
0 1 0

Expert 2

.5

.5

woody
0 1
1 0
1 0
1 0
0 1
0 1
1

fruity
0 1
0 1
0 1
1 0
1 0
1 0

.5

.5

butter
0 1
1 0
1 0
1 0
0 1
0 1

Expert 3

woody
0 1
1 0
1 0
1 0
0 1
0 1

Table 1: Data for the barrel-aged red burgundy wines example. Oak Type" is an illustrative (supplementary)
variable, The wine W? is an unknown wine treated as a supplementary observation.

H. Abdi & D. Valentin: Multiple Correspondence Analysis

H. Abdi & D. Valentin: Multiple Correspondence Analysis

The squared cosine between row i and factor ` and column j and
factor ` are obtained respectively as:
o i ,` =

f i2,`
2
d r,i

and

o j ,` =

g 2j ,`
2
d c,
j

(4)

2
2
(with d r,i
, and d c,
, being respectively the i -th element of dr and
j
the j -th element of dc ). Squared cosines help locating the factors
important for a given observation or variable.
The contribution of row i to factor ` and of column j to factor
` are obtained respectively as:

t i ,` =

f i2,`
`

and

t j ,` =

g 2j ,`
`

(5)

Contributions help locating the observations or variables important for a given factor.
Supplementary or illustrative elements can be projected onto
the factors using the so called transition formula. Specifically, let
iT
sup being an illustrative row and jsup being an illustrative column
to be projected. Their coordinates fsup and gsup are obtained as:

T
1
T
T
1
fsup = iT
1
i
G
and
g
=
j
1
.
sup
sup
sup
sup jsup F

(6)

Performing CA on the indicator matrix will provide factor scores


for the rows and the columns. The factor scores given by a CA program will need, however to be re-scaled for MCA, as explained in
the next section.
The J J table obtained as B = XT X is called the Burt matrix
associated to X. This table is important in MCA because using CA
on the Burt matrix gives the same factors as the analysis of X but
is often computationally easier. But the Burt matrix also plays an
important theoretical rle because the eigenvalues obtained from
its analysis give a better approximation of the inertia explained by
the factors than the eigenvalues of X.

H. Abdi & D. Valentin: Multiple Correspondence Analysis

5 Eigenvalue correction for multiple correspondence analysis


M CA codes data by creating several binary columns for each variable with the constraint that one and only one of the columns gets
the value 1. This coding schema creates artificial additional dimensions because one categorical variable is coded with several
columns. As a consequence, the inertia (i.e., variance) of the solution space is artificially inflated and therefore the percentage of
inertia explained by the first dimension is severely underestimated.
In fact, it can be shown that all the factors with an eigenvalue less
or equal to K1 simply code these additional dimensions (K = 10 in
our example).
Two corrections formulas are often used, the first one is due
to Benzcri (1979), the second one to Greenacre (1993). These
formulas take into account that the eigenvalues smaller than K1
are coding for the extra dimensions and that MCA is equivalent to
the analysis of the Burt matrix whose eigenvalues are equal to the
squared eigenvalues of the analysis of X. Specifically, if we denote
by ` the eigenvalues obtained from the analysis of the indicator
matrix, then the corrected eigenvalues, denoted c are obtained as

c `

1 2
K

K 1
K

if ` >

if `

1
K
.

(7)

1
K

Using this formula gives a better estimate of the inertia, extracted


by each eigenvalue.
Traditionally, the percentages of inertia are computed by dividing each eigenvalue by the sum of the eigenvalues, and this approach could be used here also. However, it will give an optimistic
estimation of the percentage of inertia. A better estimation of the
inertia has been proposed by Greenacre (1993) who suggested instead to evaluate the percentage of inertia relative to the average
inertia of the off-diagonal blocks of the Burt matrix. This average

7
.8532
.2000
.1151
.0317
1.2000

1
2
3
4

Factor

I
.7110
.1667
.0959
.0264

Indicator
Matrix

.7822

.7280
.0400
.0133
.0010

B
.9306
.0511
.0169
.0013

Burt
Matrix

.7130

.7004
.0123
.0003
0

.9823
.0173
.0004
0

Benzcri
Correction

.7130

.7004
.0123
.0003
0

.9691

.9519
.0168
.0004
0

Greenacre
Correction

inertia. The eigenvalues of the Burt matrix are equal to the squared eigenvalues of the indicator matrix; The
corrected eigenvalues for Benzcri and Greenacre are the same, but the proportion of explained variance differ.
Eigenvalues are denoted by , proportions of explained inertia by (note that the average inertia used to
compute Greenacres correction is equal to I = .7358).

Table 2: Eigenvalues, corrected eigenvalues, proportion of explained inertia and corrected proportion of explained

H. Abdi & D. Valentin: Multiple Correspondence Analysis

H. Abdi & D. Valentin: Multiple Correspondence Analysis

inertia, denoted I can be computed as

!
X 2 J K
K
I=

`
K 1
K2
`

(8)

According to this approach, the percentage of inertia would be obtained by the ratio
c =

instead of P

c `

(9)

6 Interpreting MCA
As with CA, the interpretation in MCA is often based upon proximities between points in a low-dimensional map (i.e., two or three
dimensions). As well as for CA, proximities are meaningful only
between points from the same set (i.e., rows with rows, columns
with columns). Specifically, when two row points are close to each
other they tend to select the same levels of the nominal variables.
For the proximity between variables we need to distinguish two
cases. First, the proximity between levels of different nominal variables means that these levels tend to appear together in the observations. Second, because the levels of the same nominal variable
cannot occur together, we need a different type of interpretation
for this case. Here the proximity between levels means that the
groups of observations associated with these two levels are themselves similar.

6.1 The example


Table 2 lists the corrected eigenvalues and proportion of explained
inertia obtained with the Benzcri/Greenacre correction formula.
Tables 3 and 4 give the corrected factor scores, cosines, and contributions for the rows and columns of Table 1. Figure 1 displays
the projections of the rows and the columns. We have separated
these two sets, but, because the projections have the same variance, these two graphs could be displayed together (as long as one
8

%c

9
1
2

1
2

1 .7004 95
2 .0123 2

177
83

.62
.01

0.86
0.08

Wine 1

121
333

.42
.02

0.71
0.16

Wine 2

0.86
0.08

.62
.01

.71
.01

202
83

177
83

202
83

Contributions 1000

.71
.01

0.92
0.08

Wine 5

Factor Scores

Wine 4

Squared Cosines

0.92
0.08

Wine 3

121
333

.42
.02

0.71
0.16

Wine 6

.04
.96

0.03
0.16

Wine ?

proportions of explained inertia are corrected using Benzcri/Greenacre formula. Contributions corresponding
to negative scores are in italic. The mystery wine (Wine ?) is a supplementary observation. Only the first two
factors are reported.

Table 3: Factor scores, squared cosines, and contributions for the observations ( I -set). The eigenvalues and

H. Abdi & D. Valentin: Multiple Correspondence Analysis

10

6 1= .7004
1 = 95%

Buttery
O1
Fruitn Coffeey
Fruitn

Oak

Buttern
O2
CoffeenFruity
Fruity

Vanillac
Woodc
Woodn
Fruity Roastn
Woodn

Expert 3

Vanillab
Woodb

Expert 2

Vanillaa
Wooda
Woody
Fruitn
WoodyRoasty

Expert 1

supplementary elements. (the projection points have been slightly moved to increase readability). (Projections
from Tables 3 and 4).

Figure 1: Multiple Correspondence Analysis. Projections on the first 2 dimensions. The eigenvalues () and
proportion of explained inertia () have been corrected with Benzcri/Greenacre formula. (a) The I set: rows
(i.e., wines), wine ? is a supplementary element. (b) The J set: columns (i.e., adjectives). Oak 1 and Oak 2 are

2 = .0123
2 = 2%

H. Abdi & D. Valentin: Multiple Correspondence Analysis

11
c %c

1
2

F
58
0

58
0

.00
.06

44
0
83 333

.47
.02

44
83

.47
.02

58
0

.81
.00

.90
.00

58
0

.81
.00

.90
.00

58
0

.81
.00

58
0

.81
.00

.90 .90
.00 .00

.90 .90
.00 .00

58 58
0 0

44
0
83 333

Contributions 1000

.47
.02

.00
.06

.97 .00
.18 .35

Factor Scores

vanillin

Squared Cosines
.81
.00

.97
.18

roasted

.81 .81
.00 .00

.81
.00

.97 .00
.18 .35

coffee

Expert 2

1
2

.90 .90
.00 .00

woody

red
fruit

1 .7004 95
2 .0103 2

fruity

Expert 1

44
83

.47
.02

.97
.18

58
0

.81
.00

.90
.00

58
0

.81
.00

.90
.00

woody
n

6
0

.08
.00

6
0

.08
.00

.28 .28
.00 .00

fruity

58
0

.81
.00

.90
.00

58
0

.81
.00

.90
.00

butter

Expert 3

58
0

.81
.00

.90
.00

58
0

.81
.00

.90
.00

woody

1.00 1.00
.00 .00

.90 .90
.00 .00

Oak

percentages of inertia have been corrected using Benzcri/Greenacre formula. Contributions corresponding to
negative scores are in italic. Oak 1 and 2 are supplementary variables.

Table 4: Factor scores, squared cosines, and contributions for the for the variables ( J -set). The eigenvalues and

H. Abdi & D. Valentin: Multiple Correspondence Analysis

H. Abdi & D. Valentin: Multiple Correspondence Analysis

keeps in mind that distances between point are meaningful only


within the same set). The analysis is essentially uni-dimensional,
with Wines 2, 3, and 4 being clustered on the negative side of the
factors and Wines 1,5, and 6 on the positive side. The supplementary (mystery) wine does not seem to belong to either clusters. The
analysis of the columns shows that the negative side of the factor
is characterized as being non fruity, non-woody and coffee by Expert 1, roasted, non fruity, low in vanilla and woody for Expert 2,
and buttery and woody for Expert 3. The positive side, here gives
the reverse pattern. The supplementary elements indicate that the
negative side is correlated with the second type of oak whereas the
positive side is correlated with the first type of oak.

7 Alternatives to MCA
Because the interpretation of MCA is more delicate than simple
CA , several approaches have been suggested to offer the simplicity of interpretation of CA for indicator matrices. One approach
is to use a different metric than 2 , the most attractive alternative being the Hellinger distance (see entry on distances and Escofier, 1978; Rao, 1994). Another approach, called joint correspondence analysis, fits only the off-diagonal tables of the Burt matrix
(see Greenacre, 1993), and can be interpreted as a factor analytic
model.

References
[1] Benzcri, J.P. (1979). Sur le calcul des taux dinertie dans
lanalyse dun questionnaire. Cahiers de lAnalyse des Donnes,
4, 377378.
[2] Clausen, S.E. (1998). Applied correspondence analysis. Thousand
Oaks (CA): Sage.
[3] Escofier, B. (1978). Analyse factorielle et distances rpondant au
principe dquivalence distributionnelle. Revue de Statistiques
Appliques, 26, 2937.

12

H. Abdi & D. Valentin: Multiple Correspondence Analysis

[4] Greenacre, M.J. (1984). Theory and applications of correspondence analysis. London: Academic Press.
[5] Greenacre, M.J. (1993). Correspondence analysis in practice. London: Academic Press.
[6] Rao, C. (1995). Use of Hellinger distance in graphical displays.
In E.-M. Tiit, T. Kollo, & H. Niemi (Ed.): Multivariate statistics
and matrices in statistics. Leiden (Netherland): Brill Academic
Publisher. pp. 143161.
[7] Weller S.C., & Romney, A.K. (1990). Metric scaling: Correspondence analysis. Thousand Oaks (CA): Sage.

Acknowledgments
Many thanks to Szymon Czarnik and Michael Greenacre for pointing out a mistake in a previous version of this paper.

13

You might also like