Harvard Lecture Series Session 4 - Factor Analysis
Harvard Lecture Series Session 4 - Factor Analysis
Harvard Lecture Series Session 4 - Factor Analysis
Qian-Li Xue
Biostatistics Program
Harvard Catalyst | The Harvard Clinical & Translational Science
Center
Short course, October 27, 2016
1
Well-used latent variable models
Latent Observed variable scale
variable
scale Continuous Discrete
3
What is factor analysis
Factor analysis is a theory driven
statistical data reduction technique used to
explain covariance among observed
random variables in terms of fewer
unobserved random variables named
factors
4
An Example: General Intelligence
(Charles Spearman, 1904)
Y1 1
Y2 2
General Y3 3
Intelligence
Y4 4
F
Y5 5
5
Why Factor Analysis?
1. Testing of theory
Explain covariation among multiple observed variables by
Mapping variables to latent constructs (called “factors”)
3. Scale development
Exploit redundancy to improve scale’s validity and
reliability
6
Part I. Exploratory Factor
Analysis (EFA)
7
One Common Factor Model:
Model Specification
1 Y1 1 Y1 1 F 1
F
2
Y2 2 Y2 2 F 2
3
Y3 3 Y3 3 F 3
The factor F is not observed; only Y1, Y2, Y3 are observed
i represent variability in the Yi NOT explained by F
Yi is a linear function of F and i
8
One Common Factor Model:
Model Assumptions
1 Y1 1
Y1 1 F 1
Y2 2 F 2
2
Y2 2
F
3
Y3 3 Y3 3 F 3
Factorial causation
F is independent of j, i.e. cov(F,j)=0
i and j are independent for i≠j, i.e. cov(i,j)=0
Conditional independence: Given the factor, observed variables
are independent of one another, i.e. cov( Yi ,Yj | F ) = 0
9
One Common Factor Model:
Model Interpretation
Given all variables in standardized form, i.e.
1 Y1 1 var(Yi)=var(F)=1
Factor loadings: i
2
Y2 2 i = corr(Yi,F)
F
3
42
Y4 Y5 51 F1 52 F2 5
F2 52
Y5 5 Y6 61 F1 62 F2 6
62
Y6 6
12
Factor Pattern Matrix
Columns represent derived factors 11 1m
Rows represent input variables
Loadings represent degree to which each
of the variables “correlates” with each of
the factors
Loadings range from -1 to 1 n1 nm nm
Inspection of factor loadings reveals
extent to which each of the variables
contributes to the meaning of each of the
factors.
High loadings provide meaning and
interpretation of factors (~ regression
coefficients)
13
Two-Common Factor Model (Orthogonal):
Model Assumptions
Y1 Factorial causation
1
11 F1 and F2 are independent of j, i.e.
21
F1 31 Y2 2 cov(F1,j)= cov(F2,j)= 0
41 i and j are independent for i≠j, i.e.
51
61 cov(i,j)=0
Y3 3 Conditional independence: Given
12 22 factors F1 and F2, observed variables
32
Y4 4
are independent of one another, i.e.
42
cov( Yi ,Yj | F1, F2) = 0 for i ≠j
F2 52
Y5 5 Orthogonal (=independent):
62
cov(F1,F2)=0
Y6 6
14
Two-Common Factor Model (Orthogonal):
Model Interpretation
Given all variables in standardized form, i.e.
Y1
var(Yi)=var(Fi)=1;
1
11 AND orthogonal factors, i.e. cov(F1,F2)=0
21
F1 Y2 2
31
41 Factor loadings: ij
51
61
ij = corr(Yi,Fj)
Y3 3
Communality of Yi: hi2
12 22
Y4 4
hi2 = i12 + i22=% variance of Yi
32
42
explained by F1 AND F2
F2 52
Y5 5
62
Uniqueness of Yi: 1-hi2
Y6 6
Degree of factorial determination:
=Σ ij2/n, n=# observed variables Y 15
Two-Common Factor Model :
The Oblique Case
Given all variables in standardized form,
i.e. var(Yi)=var(Fi)=1;
Y1 1
11 AND oblique factors (i.e. cov(F1,F2)≠0)
21
F1 Y2 2
31
41 The interpretation of factor loadings: ij
51 is no longer correlation between Y and
61
Y3 3 F; it is direct effect of F on Y
12 22
32
Y4 4 The calculation of communality of Yi
42 (hi2) is more complex
F2 52
Y5 5
62
Y6 6
16
Extracting initial factors
Least-squares method (e.g. principal axis
factoring with iterated communalities)
Maximum likelihood method
17
Model Fitting: Extracting initial factors
Least-squares method (LS) (e.g. principal axis factoring with
iterated communalities)
Goal: minimize the sum of squared differences
between observed and estimated corr. matrices
Fitting steps:
a) Obtain initial estimates of communalities (h2)
e.g. squared correlation between a variable and the
remaining variables
b) Solve objective function: det(RLS-ηI)=0,
where RLS is the corr matrix with h2 in the main diag. (also
termed adjusted corr matrix), η is an eigenvalue
c) Re-estimate h2
d) Repeat b) and c) until no improvement can be made
18
Model Fitting: Extracting initial factors
Maximum likelihood method (MLE)
Goal: maximize the likelihood of producing the observed corr
matrix
Assumption: distribution of variables (Y and F) is multivariate
normal
Objective function: det(RMLE- ηI)=0,
where RMLE=U-1(R-U2)U-1=U-1RLSU-1, and U2 is diag(1-h2)
Iterative fitting algorithm similar to LS approach
Exception: adjust R by giving greater weights to correlations
with smaller unique variance, i.e. 1- h2
Advantage: availability of a large sample χ2 significant test for
goodness-of-fit (but tends to select more factors for large n!)
19
Choosing among Different Methods
Between MLE and LS
LS is preferred with
few indicators per factor
Equeal loadings within factors
No large cross-loadings
No factor correlations
Recovering factors with low loadings (overextraction)
21
Factor Rotation
22
Factor Rotation (Intuitively)
F2
1, 2 1, 2
3 3
5 F1 5
4 4
F1 F2
24
Factor Rotation:
Orthogonal vs. Oblique Rotation
25
Factor Rotation:
Orthogonal vs. Oblique Rotation
28
Factor Rotation: Unique Solution?
29
Derivation of Factor Scores
Each object (e.g. each person) gets a factor score for each factor:
The factors themselves are variables
“Object’s” score is weighted combination of scores on input
variables Fˆ Wˆ Y , where Wˆ is the weight matrix.
30
Factor Analysis with
Categorical Observed Variables
Factor analysis hinges on the correlation matrix
As long as you can get an interpretable correlation matrix, you
can perform factor analysis
Binary/ordinal items?
Pearson corrlation: Expect attenuation!
Tetrachoric correlation (binary)
Polychoric correlation (ordinal)
31
Criticisms of Factor Analysis
Labels of factors can be arbitrary or lack scientific basis
Derived factors often very obvious
defense: but we get a quantification
“Garbage in, garbage out”
really a criticism of input variables
factor analysis reorganizes input matrix
Correlation matrix is often poor measure of association of
input variables.
32
Major steps in EFA
33
Part II. Confirmatory Factor
Analysis (CFA)
34
Exploratory vs. Confirmatory
Factor Analysis
Exploratory:
summarize data
describe correlation structure between variables
generate hypotheses
Confirmatory
Testing correlated measurement errors
Redundancy test of one-factor vs. multi-factor models
Measurement invariance test comparing a model
across groups
Orthogonality tests
35
Confirmatory Factor Analysis (CFA)
Takes factor analysis a step further.
We can “test” or “confirm” or “implement” a “highly
constrained a priori structure that meets conditions of
model identification”
But be careful, a model can never be confirmed!!
CFA model is constructed in advance
number of latent variables (“factors”) is pre-set by
analyst (not part of the modeling usually)
Whether latent variable influences observed is
specified
Measurement errors may correlate
Difference between CFA and the usual SEM:
SEM assumes causally interrelated latent variables
CFA assumes interrelated latent variables (i.e. exogenous)
36
Exploratory Factor Analysis
Two factor model:
x
x1 11 12 1 x1 δ1
x 22
2 21 2 ξ1 x2 δ2
x3 31 32 1 3 x3 δ3
x4 41 42 2 4 ξ2 x4 δ4
x5 51 52 5 x5 δ5
x6 61 62 6 x6 δ6
37
CFA Notation
Two factor model:
x
x1 11 0 1 x1 δ1
x 0
2 21 2 ξ1 x2 δ2
x3 31 0 1 3 x3
δ3
x4 0 42 2 4 ξ2 x4 δ4
x5 0 52 5 x5
δ5
x6 0 62 6 x16 δ6
38
Difference between CFA and EFA
CFA EFA
x 1 1 11 1 x 1 1 11 1 2 2 1
x 2 2 11 2 x 2 2 11 2 2 2 2
x 3 3 11 3 x 3 3 11 3 2 2 3
x 4 422 4 x 4 4 11 4 2 2 4
x 5 522 5 x 5 5 11 5 2 2 5
x 6 622 6 x 6 6 11 6 2 2 6
c o v ( 1 , 2 ) 12 c o v ( 1 , 2 ) 0
39
Model Constraints
Hallmark of CFA
Purposes for setting constraints:
Test a priori theory
Ensure identifiability
Test reliability of measures
40
Identifiability
Let be a t1 vector containing all
unknown and unconstrained parameters in
a model. The parameters are identified if
(1)= (2) 1=2
Estimability ≠ Identifiability !!
Identifiability – attribute of the model
Estimability – attribute of the data
41
Model Constraints: Identifiability
1 x1 δ1 x1
1 δ1
ξ1 x2 δ2 ξ1 x2 δ2
x3 δ3 x3 δ3
1
ξ2 x4 δ4 ξ2 1 x4 δ4
x5 δ5 x5 δ5
x6 δ6 x6 δ6
43
Model Parametrization
Fix variances: Fix path:
x 1 1 11 1 x 1 1 1
x 2 2 11 2 x 2 2 11 2
x 3 3 11 3 x 3 3 11 3
x 4 422 4 x 4 2 4
x 5 522 5 x 5 522 5
x 6 622 6 x 6 622 6
c o v ( 1 , 2 ) 12 c o v ( 1 , 2 ) 12
v a r ( 1 ) 1 v a r ( 1 ) 11
v a r(2 ) 1 v a r(2 ) 22 44
Identifiability Rules for CFA
(1) Two-indicator rule (sufficient, not necessary)
1) At least two factors
2) At least two indicators per factor
3) Exactly one non-zero element per row of Λ
(translation: each x only is pointed at by one LV)
4) Non-correlated errors (Θ is diagonal)
(translation: no double-header arrows between the δ’s)
5) Factors are correlated (Φ has no zero elements)*
(translation: there are double-header arrows between all of
the LVs)
* Alternative less strict criteria: each factor is correlated with
at least one other factor.
(see page 247 on Bollen) 45
x1 11 0 1 1 x1 δ1
x 0
2 21 2 ξ1 x2 δ2
x3 31 0 1 3
x3 δ3
1
x4 0 42 2 4 ξ2 x4 δ4
x5 0 52 5
x5 δ5
x6 0 62 6
x16 δ6
11 0 0 0 0 0
0 22 0 0 0 0
0
v a r( )
0 33 0 0 0 1 12
v a r( )
1
0 0 0 44 0 0
0 0 0 0 55 0 12
0 0 0 0 0 66
46
Example: Two-Indicator Rule
1 x1 δ1
ξ1 x2 δ2
1
ξ2 x3 δ3
x4 δ4
1
ξ3 x5 δ5
x6 δ6
47
Example: Two-Indicator Rule
1 x1 δ1
ξ1 x2 δ2
1
ξ2 x3 δ3
x4 δ4
1
ξ3 x5 δ5
x6 δ6
48
Example: Two-Indicator Rule
1 x1 δ1
ξ1 x2 δ2
1
ξ2 x3 δ3
x4 δ4
1
ξ3 x5 δ5
x6 δ6
1
ξ4 x7 δ7
x8 δ8 49
Identifiability Rules for CFA
(2) Three-indicator rule (sufficient, not necessary)
1) at least one factor
2) at least three indicators per factor
3) one non-zero element per row of Λ
(translation: each x only is pointed at by one LV)
4) non-correlated errors (Θ is diagonal)
(translation: no double-headed arrows between the δ’s)
[Note: no condition about correlation of factors (no
restrictions on Φ).] x1 δ1
ξ1 x2 δ2
x3 δ3
50