Harvard Lecture Series Session 4 - Factor Analysis

Factor Analysis

Qian-Li Xue
Biostatistics Program
Harvard Catalyst | The Harvard Clinical & Translational Science
Short course, October 27, 2016

Well-used latent variable models
Latent Observed variable scale
scale Continuous Discrete

Continuous Factor Discrete FA

analysis IRT (item response)
Discrete Latent profile Latent class
Growth mixture analysis, regression
General software: MPlus, Latent Gold, WinBugs (Bayesian), NLMIXED (SAS)
 What is factor analysis?

 What do we need factor analysis for?

 What are the modeling assumptions?

 How to specify, fit, and interpret factor models?

 What is the difference between exploratory and

confirmatory factor analysis?

 What is and how to assess model identifiability?

What is factor analysis
 Factor analysis is a theory driven
statistical data reduction technique used to
explain covariance among observed
random variables in terms of fewer
unobserved random variables named

An Example: General Intelligence
(Charles Spearman, 1904)

Y1 1

Y2 2

General Y3 3

Y4 4

Y5 5

Why Factor Analysis?
1. Testing of theory
 Explain covariation among multiple observed variables by
 Mapping variables to latent constructs (called “factors”)

2. Understanding the structure underlying a set of

 Gain insight to dimensions
 Construct validation (e.g., convergent validity)

3. Scale development
 Exploit redundancy to improve scale’s validity and

Part I. Exploratory Factor
Analysis (EFA)

One Common Factor Model:
Model Specification

1 Y1 1 Y1  1 F  1
Y2 2 Y2  2 F   2

Y3 3 Y3  3 F   3
 The factor F is not observed; only Y1, Y2, Y3 are observed
 i represent variability in the Yi NOT explained by F
 Yi is a linear function of F and i

One Common Factor Model:
Model Assumptions
1 Y1 1
Y1  1 F  1
Y2  2 F   2
Y2 2

Y3 3 Y3  3 F   3
 Factorial causation
 F is independent of j, i.e. cov(F,j)=0
 i and j are independent for i≠j, i.e. cov(i,j)=0
 Conditional independence: Given the factor, observed variables
are independent of one another, i.e. cov( Yi ,Yj | F ) = 0

One Common Factor Model:
Model Interpretation
Given all variables in standardized form, i.e.
1 Y1 1 var(Yi)=var(F)=1
 Factor loadings: i
Y2 2 i = corr(Yi,F)

Y3 3  Communality of Yi: hi2

hi2 = i2 = [corr(Yi,F)]2
=% variance of Yi explained by F
Y1  1 F  1
 Uniqueness of Yi: 1-hi2
Y2  2 F   2 = residual variance of Yi
Y3  3 F   3
 Degree of factorial determination:
=Σ i2/n, where n=# observed variables Y
Two-Common Factor Model (Orthogonal):
Model Specification
Y1 1
11 Y1  11 F1  12 F2  1
F1 2
Y2  21 F1  22 F2   2
61 Y3  31 F1  32 F2   3
Y3 3
Y4  41 F1  42 F2   4
12 22 

Y4 Y5  51 F1  52 F2   5
F2 52
Y5 5 Y6  61 F1  62 F2   6

Y6 6

F1 and F2 are common factors because they are shared by ≥2 variables !

Matrix Notation
with n variables and m factors
Ynx1 = ΛnxmFmx1 + nx1

Y1   11   1m  1 

       F1   
      
 
         
     Fm  m1  
Yn  n1   nm  nm  n  n1

Factor Pattern Matrix
 Columns represent derived factors  11   1m 
 Rows represent input variables     
 Loadings represent degree to which each  
of the variables “correlates” with each of    
the factors  
 Loadings range from -1 to 1 n1   nm  nm
 Inspection of factor loadings reveals
extent to which each of the variables
contributes to the meaning of each of the
 High loadings provide meaning and
interpretation of factors (~ regression

Two-Common Factor Model (Orthogonal):
Model Assumptions

Y1  Factorial causation
11  F1 and F2 are independent of j, i.e.
F1 31 Y2 2 cov(F1,j)= cov(F2,j)= 0
41  i and j are independent for i≠j, i.e.
61 cov(i,j)=0
Y3 3  Conditional independence: Given
12 22  factors F1 and F2, observed variables
Y4 4
are independent of one another, i.e.
cov( Yi ,Yj | F1, F2) = 0 for i ≠j
F2 52
Y5 5  Orthogonal (=independent):
Y6 6

Two-Common Factor Model (Orthogonal):
Model Interpretation
Given all variables in standardized form, i.e.
11 AND orthogonal factors, i.e. cov(F1,F2)=0
F1 Y2 2
41  Factor loadings: ij
ij = corr(Yi,Fj)
Y3 3
 Communality of Yi: hi2
12 22 
Y4 4
hi2 = i12 +  i22=% variance of Yi

explained by F1 AND F2
F2 52
Y5 5
 Uniqueness of Yi: 1-hi2
Y6 6
 Degree of factorial determination:
=Σ ij2/n, n=# observed variables Y 15
Two-Common Factor Model :
The Oblique Case
Given all variables in standardized form,
i.e. var(Yi)=var(Fi)=1;
Y1 1
11 AND oblique factors (i.e. cov(F1,F2)≠0)
F1 Y2 2
41  The interpretation of factor loadings: ij
51 is no longer correlation between Y and
Y3 3 F; it is direct effect of F on Y

12 22 
Y4 4  The calculation of communality of Yi
42 (hi2) is more complex
F2 52
Y5 5

Y6 6

Extracting initial factors
 Least-squares method (e.g. principal axis
factoring with iterated communalities)
 Maximum likelihood method

Model Fitting: Extracting initial factors
Least-squares method (LS) (e.g. principal axis factoring with
iterated communalities)
 Goal: minimize the sum of squared differences
between observed and estimated corr. matrices
 Fitting steps:
a) Obtain initial estimates of communalities (h2)
e.g. squared correlation between a variable and the
remaining variables
b) Solve objective function: det(RLS-ηI)=0,
where RLS is the corr matrix with h2 in the main diag. (also
termed adjusted corr matrix), η is an eigenvalue
c) Re-estimate h2
d) Repeat b) and c) until no improvement can be made
Model Fitting: Extracting initial factors
Maximum likelihood method (MLE)
 Goal: maximize the likelihood of producing the observed corr
 Assumption: distribution of variables (Y and F) is multivariate
 Objective function: det(RMLE- ηI)=0,
where RMLE=U-1(R-U2)U-1=U-1RLSU-1, and U2 is diag(1-h2)
 Iterative fitting algorithm similar to LS approach
 Exception: adjust R by giving greater weights to correlations
with smaller unique variance, i.e. 1- h2
 Advantage: availability of a large sample χ2 significant test for
goodness-of-fit (but tends to select more factors for large n!)

Choosing among Different Methods
 Between MLE and LS
 LS is preferred with
 few indicators per factor
 Equeal loadings within factors
 No large cross-loadings
 No factor correlations
 Recovering factors with low loadings (overextraction)

 MLE if preferred with

 Multivariate normality
 unequal loadings within factors

 Both MLE and LS may have convergence problems

Factor Rotation
 Goal is simple structure
 Make factors more easily interpretable
 While keeping the number of factors and
communalities of Ys fixed!!!
 Rotation does NOT improve fit!

Factor Rotation

To do this we “rotate” factors:

 redefine factors such that ‘loadings’ (or
pattern matrix coefficients) on various factors
tend to be very high (-1 or 1) or very low (0)
 intuitively, it makes sharper distinctions in the
meanings of the factors

Factor Rotation (Intuitively)

1, 2 1, 2

3 3

5 F1 5
4 4
F1 F2

Factor 1 Factor 2 Factor 1 Factor 2

x1 0.4 0.69 x1 -0.8 0
x2 0.4 0.69 x2 -0.8 0
x3 0.65 0.32 x3 -0.6 0.4
x4 0.69 -0.4 x4 0 0.8
x5 0.61 -0.35 x5 0 0.7 23
Factor Rotation

 Uses “ambiguity” or non-uniqueness of solution

to make interpretation more simple

 Where does ambiguity come in?

 Unrotated solution is based on the idea that each
factor tries to maximize variance explained,
conditional on previous factors
 What if we take that away?
 Then, there is not one “best” solution

Factor Rotation:
Orthogonal vs. Oblique Rotation

 Orthogonal: Factors are independent

 varimax: maximize variance of squared loadings
across variables (sum over factors)
 Goal: the simplicity of interpretation of factors
 quartimax: maximize variance of squared loadings
across factors (sum over variables)
 Goal: the simplicity of interpretation of variables
 Intuition: from previous picture, there is a right
angle between axes
 Note: “Uniquenesses” remain the same!

Factor Rotation:
Orthogonal vs. Oblique Rotation

 Oblique: Factors are NOT independent. Change

in “angle.”
 oblimin: minimize covariance of squared loadings
between factors.
 promax: simplify orthogonal rotation by making small
loadings even closer to zero.
 Target matrix: choose “simple structure” a priori.
 Intuition: from previous picture, angle between
axes is not necessarily a right angle.
 Note: “Uniquenesses” remain the same!
Pattern versus Structure Matrix
 In oblique rotation, one typically presents both a pattern
matrix and a structure matrix

 Also need to report correlation between the factors

 The pattern matrix presents the usual factor loadings

 The structure matrix presents correlations between the

variables and the factors

 For orthogonal factors, pattern matrix=structure matrix

 The pattern matrix is used to interpret the factors

Factor Rotation: Which to use?

 Choice is generally not critical

 Interpretation with orthogonal (varimax) is
“simple” because factors are independent:
“Loadings” are correlations.
 Configuration may appear more simple in
oblique (promax), but correlation of factors
can be difficult to reconcile.
 Theory? Are the conceptual meanings of the
factors associated?

Factor Rotation: Unique Solution?

 The factor analysis solution is NOT unique!

 More than one solution will yield the same

Derivation of Factor Scores
 Each object (e.g. each person) gets a factor score for each factor:
 The factors themselves are variables
 “Object’s” score is weighted combination of scores on input
variables Fˆ  Wˆ Y , where Wˆ is the weight matrix.

 These weights are NOT the factor loadings!

 Different approaches exist for estimating Ŵ (e.g. regression method)
 Factor scores are not unique
 Using factors scores instead of factor indicators can reduce
measurement error, but does NOT remove it.
 Therefore, using factor scores as predictors in conventional
regressions leads to inconsistent coefficient estimators!

Factor Analysis with
Categorical Observed Variables
 Factor analysis hinges on the correlation matrix
 As long as you can get an interpretable correlation matrix, you
can perform factor analysis
 Binary/ordinal items?
 Pearson corrlation: Expect attenuation!
 Tetrachoric correlation (binary)
 Polychoric correlation (ordinal)

To obtain polychoric correlation in STATA:

polychoric var1 var2 var3 var4 var5 …
To run princial component analysis:
pcamat r(R), n(328)
To run factor analysis:
factormat r(R), fa(2) ipf n(328)

Criticisms of Factor Analysis
 Labels of factors can be arbitrary or lack scientific basis
 Derived factors often very obvious
 defense: but we get a quantification
 “Garbage in, garbage out”
 really a criticism of input variables
 factor analysis reorganizes input matrix
 Correlation matrix is often poor measure of association of
input variables.

Major steps in EFA

Part II. Confirmatory Factor
Analysis (CFA)

Exploratory vs. Confirmatory
Factor Analysis
 Exploratory:
 summarize data
 describe correlation structure between variables
 generate hypotheses
 Confirmatory
 Testing correlated measurement errors
 Redundancy test of one-factor vs. multi-factor models
 Measurement invariance test comparing a model
across groups
 Orthogonality tests
Confirmatory Factor Analysis (CFA)
 Takes factor analysis a step further.
 We can “test” or “confirm” or “implement” a “highly
constrained a priori structure that meets conditions of
model identification”
 But be careful, a model can never be confirmed!!
 CFA model is constructed in advance
 number of latent variables (“factors”) is pre-set by
analyst (not part of the modeling usually)
 Whether latent variable influences observed is
 Measurement errors may correlate
 Difference between CFA and the usual SEM:
 SEM assumes causally interrelated latent variables
 CFA assumes interrelated latent variables (i.e. exogenous)
Exploratory Factor Analysis
Two factor model:
x    
 x1   11 12  1  x1 δ1
 x   22   
 2   21  2 ξ1 x2 δ2
 x3  31 32  1   3  x3 δ3
      
 x4  41 42   2   4  ξ2 x4 δ4
 x5  51 52   5  x5 δ5
     
 x6  61 62   6  x6 δ6

CFA Notation
Two factor model:
x    
 x1   11 0   1  x1 δ1
 x   0   
 2   21   2 ξ1 x2 δ2
 x3  31 0  1   3  x3
      

 x4   0 42   2   4  ξ2 x4 δ4
 x5   0 52   5  x5
      δ5
 x6   0 62   6  x16 δ6

Difference between CFA and EFA
x 1   1 11   1 x 1  1 11   1 2 2   1
x 2   2 11   2 x 2   2 11   2 2  2   2
x 3   3 11   3 x 3   3 11   3 2 2   3
x 4  422   4 x 4   4 11   4 2  2   4
x 5  522   5 x 5   5 11   5 2  2   5
x 6  622   6 x 6   6 11   6 2 2   6
c o v ( 1 , 2 )   12 c o v ( 1 , 2 )  0
Model Constraints
 Hallmark of CFA
 Purposes for setting constraints:
 Test a priori theory
 Ensure identifiability
 Test reliability of measures

 Let  be a t1 vector containing all
unknown and unconstrained parameters in
a model. The parameters  are identified if
(1)= (2)  1=2
 Estimability ≠ Identifiability !!
 Identifiability – attribute of the model
 Estimability – attribute of the data

Model Constraints: Identifiability

 Latent variables (LVs) need some

 Because factors are unmeasured, their
variances can take different values
 Recall EFA where we constrained factors:
F ~ N(0,1)
 Otherwise, model is not identifiable.
 Here we have two options:
 Fix variance of latent variables (LV) to be 1 (or
another constant)
 Fix one path between LV and indicator
Necessary Constraints
Fix variances: Fix path:

1 x1 δ1 x1
1 δ1
ξ1 x2 δ2 ξ1 x2 δ2
x3 δ3 x3 δ3
ξ2 x4 δ4 ξ2 1 x4 δ4
x5 δ5 x5 δ5
x6 δ6 x6 δ6

Model Parametrization
Fix variances: Fix path:
x 1   1 11   1 x 1  1   1
x 2   2 11   2 x 2   2 11   2
x 3   3 11   3 x 3   3 11   3
x 4  422   4 x 4  2   4
x 5  522   5 x 5  522   5
x 6  622   6 x 6  622   6
c o v ( 1 ,  2 )   12 c o v ( 1 , 2 )   12
v a r ( 1 )  1 v a r ( 1 )   11
v a r(2 )  1 v a r(2 )   22 44
Identifiability Rules for CFA
(1) Two-indicator rule (sufficient, not necessary)
1) At least two factors
2) At least two indicators per factor
3) Exactly one non-zero element per row of Λ
(translation: each x only is pointed at by one LV)
4) Non-correlated errors (Θ is diagonal)
(translation: no double-header arrows between the δ’s)
5) Factors are correlated (Φ has no zero elements)*
(translation: there are double-header arrows between all of
the LVs)
* Alternative less strict criteria: each factor is correlated with
at least one other factor.
(see page 247 on Bollen) 45
 x1   11 0  1  1 x1 δ1
 x   0   
 2   21   2 ξ1 x2 δ2
 x3  31 0  1   3 
      
x3 δ3
 x4   0 42   2   4  ξ2 x4 δ4
 x5   0 52   5 
      x5 δ5
 x6   0 62   6 
x16 δ6

 11 0 0 0 0 0 
 0  22 0 0 0 0 
 
 0
  v a r( )  
0  33 0 0 0   1  12 
   v a r( )  
1 
 0 0 0  44 0 0 
 0 0 0 0 55 0   12
 
 0 0 0 0 0  66 
Example: Two-Indicator Rule
1 x1 δ1
ξ1 x2 δ2

ξ2 x3 δ3

x4 δ4

ξ3 x5 δ5

x6 δ6

Example: Two-Indicator Rule
1 x1 δ1
ξ1 x2 δ2

ξ2 x3 δ3

x4 δ4


ξ3 x5 δ5

x6 δ6

Example: Two-Indicator Rule
1 x1 δ1
ξ1 x2 δ2

ξ2 x3 δ3

x4 δ4
ξ3 x5 δ5

x6 δ6

ξ4 x7 δ7

x8 δ8 49
Identifiability Rules for CFA
(2) Three-indicator rule (sufficient, not necessary)
1) at least one factor
2) at least three indicators per factor
3) one non-zero element per row of Λ
(translation: each x only is pointed at by one LV)
4) non-correlated errors (Θ is diagonal)
(translation: no double-headed arrows between the δ’s)
[Note: no condition about correlation of factors (no
restrictions on Φ).] x1 δ1

ξ1 x2 δ2

x3 δ3

