Harvard Lecture Series Session 4 - Factor Analysis

Factor Analysis
Qian-Li Xue
Biostatistics Program
Harvard Catalyst | The Harvard Clinical & Translational Science
Center
Short course, October 27, 2016
1
Well-used latent variable models
Latent Observed variable scale
variable
scale Continuous Discrete
Continuous Factor Discrete FA

analysis IRT (item response)
LISREL
Discrete Latent profile Latent class
Growth mixture analysis, regression
General software: MPlus, Latent Gold, WinBugs (Bayesian), NLMIXED (SAS)
Objectives
 What is factor analysis?
 What do we need factor analysis for?
 What are the modeling assumptions?
 How to specify, fit, and interpret factor models?
 What is the difference between exploratory and

confirmatory factor analysis?
 What is and how to assess model identifiability?
3
What is factor analysis
 Factor analysis is a theory driven
statistical data reduction technique used to
explain covariance among observed
random variables in terms of fewer
unobserved random variables named
factors
4
An Example: General Intelligence
(Charles Spearman, 1904)
Y1 1
Y2 2
General Y3 3
Intelligence
Y4 4
F
Y5 5
5
Why Factor Analysis?
1. Testing of theory
 Explain covariation among multiple observed variables by
 Mapping variables to latent constructs (called “factors”)
2. Understanding the structure underlying a set of

measures
 Gain insight to dimensions
 Construct validation (e.g., convergent validity)
3. Scale development
 Exploit redundancy to improve scale’s validity and
reliability
6
Part I. Exploratory Factor
Analysis (EFA)
7
One Common Factor Model:
Model Specification
1 Y1 1 Y1  1 F  1
F
2
Y2 2 Y2  2 F   2
3
Y3 3 Y3  3 F   3
 The factor F is not observed; only Y1, Y2, Y3 are observed
 i represent variability in the Yi NOT explained by F
 Yi is a linear function of F and i
8
Model Assumptions
1 Y1 1
Y1  1 F  1
Y2  2 F   2
2
Y2 2
F
3
Y3 3 Y3  3 F   3
 Factorial causation
 F is independent of j, i.e. cov(F,j)=0
 i and j are independent for i≠j, i.e. cov(i,j)=0
 Conditional independence: Given the factor, observed variables
are independent of one another, i.e. cov( Yi ,Yj | F ) = 0
9
Model Interpretation
Given all variables in standardized form, i.e.
1 Y1 1 var(Yi)=var(F)=1
 Factor loadings: i
2
Y2 2 i = corr(Yi,F)
F
3
Y3 3  Communality of Yi: hi2

hi2 = i2 = [corr(Yi,F)]2
=% variance of Yi explained by F
Y1  1 F  1
 Uniqueness of Yi: 1-hi2
Y2  2 F   2 = residual variance of Yi
Y3  3 F   3
 Degree of factorial determination:
=Σ i2/n, where n=# observed variables Y
10
Two-Common Factor Model (Orthogonal):
Model Specification
Y1 1
11 Y1  11 F1  12 F2  1
21
F1 2
31
41
Y2
Y2  21 F1  22 F2   2
51
61 Y3  31 F1  32 F2   3
Y3 3
Y4  41 F1  42 F2   4
12 22 
4
32
42
Y4 Y5  51 F1  52 F2   5
F2 52
Y5 5 Y6  61 F1  62 F2   6
62
Y6 6
F1 and F2 are common factors because they are shared by ≥2 variables !

11
Matrix Notation
with n variables and m factors
Ynx1 = ΛnxmFmx1 + nx1
Y1   11   1m  1 

       F1   
      
 
         
     Fm  m1  
Yn  n1   nm  nm  n  n1
12
Factor Pattern Matrix
 Columns represent derived factors  11   1m 
 Rows represent input variables     
 Loadings represent degree to which each  
of the variables “correlates” with each of    
the factors  
 Loadings range from -1 to 1 n1   nm  nm
 Inspection of factor loadings reveals
extent to which each of the variables
contributes to the meaning of each of the
factors.
 High loadings provide meaning and
interpretation of factors (~ regression
coefficients)
13
Model Assumptions
Y1  Factorial causation
1
11  F1 and F2 are independent of j, i.e.
21
F1 31 Y2 2 cov(F1,j)= cov(F2,j)= 0
41  i and j are independent for i≠j, i.e.
51
61 cov(i,j)=0
Y3 3  Conditional independence: Given
12 22  factors F1 and F2, observed variables
32
Y4 4
are independent of one another, i.e.
42
cov( Yi ,Yj | F1, F2) = 0 for i ≠j
F2 52
Y5 5  Orthogonal (=independent):
62
cov(F1,F2)=0
Y6 6
14
Model Interpretation
Given all variables in standardized form, i.e.
Y1
var(Yi)=var(Fi)=1;
1
11 AND orthogonal factors, i.e. cov(F1,F2)=0
21
F1 Y2 2
31
41  Factor loadings: ij
51
61
ij = corr(Yi,Fj)
Y3 3
 Communality of Yi: hi2
12 22 
Y4 4
hi2 = i12 +  i22=% variance of Yi
32
42
explained by F1 AND F2
F2 52
Y5 5
62
 Uniqueness of Yi: 1-hi2
Y6 6
 Degree of factorial determination:
=Σ ij2/n, n=# observed variables Y 15
Two-Common Factor Model :
The Oblique Case
Given all variables in standardized form,
i.e. var(Yi)=var(Fi)=1;
Y1 1
11 AND oblique factors (i.e. cov(F1,F2)≠0)
21
F1 Y2 2
31
41  The interpretation of factor loadings: ij
51 is no longer correlation between Y and
61
Y3 3 F; it is direct effect of F on Y
12 22 
32
Y4 4  The calculation of communality of Yi
42 (hi2) is more complex
F2 52
Y5 5
62
Y6 6
16
Extracting initial factors
 Least-squares method (e.g. principal axis
factoring with iterated communalities)
 Maximum likelihood method
17
Model Fitting: Extracting initial factors
Least-squares method (LS) (e.g. principal axis factoring with
iterated communalities)
 Goal: minimize the sum of squared differences
between observed and estimated corr. matrices
 Fitting steps:
a) Obtain initial estimates of communalities (h2)
e.g. squared correlation between a variable and the
remaining variables
b) Solve objective function: det(RLS-ηI)=0,
where RLS is the corr matrix with h2 in the main diag. (also
termed adjusted corr matrix), η is an eigenvalue
c) Re-estimate h2
d) Repeat b) and c) until no improvement can be made
18
Model Fitting: Extracting initial factors
Maximum likelihood method (MLE)
 Goal: maximize the likelihood of producing the observed corr
matrix
 Assumption: distribution of variables (Y and F) is multivariate
normal
 Objective function: det(RMLE- ηI)=0,
where RMLE=U-1(R-U2)U-1=U-1RLSU-1, and U2 is diag(1-h2)
 Iterative fitting algorithm similar to LS approach
 Exception: adjust R by giving greater weights to correlations
with smaller unique variance, i.e. 1- h2
 Advantage: availability of a large sample χ2 significant test for
goodness-of-fit (but tends to select more factors for large n!)
19
Choosing among Different Methods
 Between MLE and LS
 LS is preferred with
 few indicators per factor
 Equeal loadings within factors
 No large cross-loadings
 No factor correlations
 Recovering factors with low loadings (overextraction)
 MLE if preferred with

 Multivariate normality
 unequal loadings within factors
 Both MLE and LS may have convergence problems

20
Factor Rotation
 Goal is simple structure
 Make factors more easily interpretable
 While keeping the number of factors and
communalities of Ys fixed!!!
 Rotation does NOT improve fit!
21
Factor Rotation
To do this we “rotate” factors:

 redefine factors such that ‘loadings’ (or
pattern matrix coefficients) on various factors
tend to be very high (-1 or 1) or very low (0)
 intuitively, it makes sharper distinctions in the
meanings of the factors
22
Factor Rotation (Intuitively)
F2
1, 2 1, 2
3 3
5 F1 5
4 4
F1 F2
Factor 1 Factor 2 Factor 1 Factor 2

x1 0.4 0.69 x1 -0.8 0
x2 0.4 0.69 x2 -0.8 0
x3 0.65 0.32 x3 -0.6 0.4
x4 0.69 -0.4 x4 0 0.8
x5 0.61 -0.35 x5 0 0.7 23
Factor Rotation
 Uses “ambiguity” or non-uniqueness of solution

to make interpretation more simple
 Where does ambiguity come in?

 Unrotated solution is based on the idea that each
factor tries to maximize variance explained,
conditional on previous factors
 What if we take that away?
 Then, there is not one “best” solution
24
Factor Rotation:
Orthogonal vs. Oblique Rotation
 Orthogonal: Factors are independent

 varimax: maximize variance of squared loadings
across variables (sum over factors)
 Goal: the simplicity of interpretation of factors
 quartimax: maximize variance of squared loadings
across factors (sum over variables)
 Goal: the simplicity of interpretation of variables
 Intuition: from previous picture, there is a right
angle between axes
 Note: “Uniquenesses” remain the same!
25
Factor Rotation:
Orthogonal vs. Oblique Rotation
 Oblique: Factors are NOT independent. Change

in “angle.”
 oblimin: minimize covariance of squared loadings
between factors.
 promax: simplify orthogonal rotation by making small
loadings even closer to zero.
 Target matrix: choose “simple structure” a priori.
 Intuition: from previous picture, angle between
axes is not necessarily a right angle.
 Note: “Uniquenesses” remain the same!
26
Pattern versus Structure Matrix
 In oblique rotation, one typically presents both a pattern
matrix and a structure matrix
 Also need to report correlation between the factors
 The pattern matrix presents the usual factor loadings
 The structure matrix presents correlations between the

variables and the factors
 For orthogonal factors, pattern matrix=structure matrix
 The pattern matrix is used to interpret the factors

27
Factor Rotation: Which to use?
 Choice is generally not critical

 Interpretation with orthogonal (varimax) is
“simple” because factors are independent:
“Loadings” are correlations.
 Configuration may appear more simple in
oblique (promax), but correlation of factors
can be difficult to reconcile.
 Theory? Are the conceptual meanings of the
factors associated?
28
Factor Rotation: Unique Solution?
 The factor analysis solution is NOT unique!

 More than one solution will yield the same
“result.”
29
Derivation of Factor Scores
 Each object (e.g. each person) gets a factor score for each factor:
 The factors themselves are variables
 “Object’s” score is weighted combination of scores on input
variables Fˆ  Wˆ Y , where Wˆ is the weight matrix.
 These weights are NOT the factor loadings!

 Different approaches exist for estimating Ŵ (e.g. regression method)
 Factor scores are not unique
 Using factors scores instead of factor indicators can reduce
measurement error, but does NOT remove it.
 Therefore, using factor scores as predictors in conventional
regressions leads to inconsistent coefficient estimators!
30
Factor Analysis with
Categorical Observed Variables
 Factor analysis hinges on the correlation matrix
 As long as you can get an interpretable correlation matrix, you
can perform factor analysis
 Binary/ordinal items?
 Pearson corrlation: Expect attenuation!
 Tetrachoric correlation (binary)
 Polychoric correlation (ordinal)
To obtain polychoric correlation in STATA:

polychoric var1 var2 var3 var4 var5 …
To run princial component analysis:
pcamat r(R), n(328)
To run factor analysis:
factormat r(R), fa(2) ipf n(328)
31
Criticisms of Factor Analysis
 Labels of factors can be arbitrary or lack scientific basis
 Derived factors often very obvious
 defense: but we get a quantification
 “Garbage in, garbage out”
 really a criticism of input variables
 factor analysis reorganizes input matrix
 Correlation matrix is often poor measure of association of
input variables.
32
Major steps in EFA
33
Part II. Confirmatory Factor
Analysis (CFA)
34
Exploratory vs. Confirmatory
Factor Analysis
 Exploratory:
 summarize data
 describe correlation structure between variables
 generate hypotheses
 Confirmatory
 Testing correlated measurement errors
 Redundancy test of one-factor vs. multi-factor models
 Measurement invariance test comparing a model
across groups
 Orthogonality tests
35
Confirmatory Factor Analysis (CFA)
 Takes factor analysis a step further.
 We can “test” or “confirm” or “implement” a “highly
constrained a priori structure that meets conditions of
model identification”
 But be careful, a model can never be confirmed!!
 CFA model is constructed in advance
 number of latent variables (“factors”) is pre-set by
analyst (not part of the modeling usually)
 Whether latent variable influences observed is
specified
 Measurement errors may correlate
 Difference between CFA and the usual SEM:
 SEM assumes causally interrelated latent variables
 CFA assumes interrelated latent variables (i.e. exogenous)
36
Exploratory Factor Analysis
Two factor model:
x    
 x1   11 12  1  x1 δ1
 x   22   
 2   21  2 ξ1 x2 δ2
 x3  31 32  1   3  x3 δ3
      
 x4  41 42   2   4  ξ2 x4 δ4
 x5  51 52   5  x5 δ5
     
 x6  61 62   6  x6 δ6
37
CFA Notation
Two factor model:
x    
 x1   11 0   1  x1 δ1
 x   0   
 2   21   2 ξ1 x2 δ2
 x3  31 0  1   3  x3
      
δ3
 x4   0 42   2   4  ξ2 x4 δ4
 x5   0 52   5  x5
      δ5
 x6   0 62   6  x16 δ6
38
Difference between CFA and EFA
CFA EFA
x 1   1 11   1 x 1  1 11   1 2 2   1
x 2   2 11   2 x 2   2 11   2 2  2   2
x 3   3 11   3 x 3   3 11   3 2 2   3
x 4  422   4 x 4   4 11   4 2  2   4
x 5  522   5 x 5   5 11   5 2  2   5
x 6  622   6 x 6   6 11   6 2 2   6
c o v ( 1 , 2 )   12 c o v ( 1 , 2 )  0
39
Model Constraints
 Hallmark of CFA
 Purposes for setting constraints:
 Test a priori theory
 Ensure identifiability
 Test reliability of measures
40
Identifiability
 Let  be a t1 vector containing all
unknown and unconstrained parameters in
a model. The parameters  are identified if
(1)= (2)  1=2
 Estimability ≠ Identifiability !!
 Identifiability – attribute of the model
 Estimability – attribute of the data
41
Model Constraints: Identifiability
 Latent variables (LVs) need some

constraints
 Because factors are unmeasured, their
variances can take different values
 Recall EFA where we constrained factors:
F ~ N(0,1)
 Otherwise, model is not identifiable.
 Here we have two options:
 Fix variance of latent variables (LV) to be 1 (or
another constant)
 Fix one path between LV and indicator
42
Necessary Constraints
Fix variances: Fix path:
1 x1 δ1 x1
1 δ1
ξ1 x2 δ2 ξ1 x2 δ2
x3 δ3 x3 δ3
1
ξ2 x4 δ4 ξ2 1 x4 δ4
x5 δ5 x5 δ5
x6 δ6 x6 δ6
43
Model Parametrization
Fix variances: Fix path:
x 1   1 11   1 x 1  1   1
x 2   2 11   2 x 2   2 11   2
x 3   3 11   3 x 3   3 11   3
x 4  422   4 x 4  2   4
x 5  522   5 x 5  522   5
x 6  622   6 x 6  622   6
c o v ( 1 ,  2 )   12 c o v ( 1 , 2 )   12
v a r ( 1 )  1 v a r ( 1 )   11
v a r(2 )  1 v a r(2 )   22 44
Identifiability Rules for CFA
(1) Two-indicator rule (sufficient, not necessary)
1) At least two factors
2) At least two indicators per factor
3) Exactly one non-zero element per row of Λ
(translation: each x only is pointed at by one LV)
4) Non-correlated errors (Θ is diagonal)
(translation: no double-header arrows between the δ’s)
5) Factors are correlated (Φ has no zero elements)*
(translation: there are double-header arrows between all of
the LVs)
* Alternative less strict criteria: each factor is correlated with
at least one other factor.
(see page 247 on Bollen) 45
 x1   11 0  1  1 x1 δ1
 x   0   
 2   21   2 ξ1 x2 δ2
 x3  31 0  1   3 
      
x3 δ3
1
 x4   0 42   2   4  ξ2 x4 δ4
 x5   0 52   5 
      x5 δ5
 x6   0 62   6 
x16 δ6
 11 0 0 0 0 0 
 0  22 0 0 0 0 
 
 0
  v a r( )  
0  33 0 0 0   1  12 
   v a r( )  
1 
 0 0 0  44 0 0 
 0 0 0 0 55 0   12
 
 0 0 0 0 0  66 
46
Example: Two-Indicator Rule
1 x1 δ1
ξ1 x2 δ2
1
ξ2 x3 δ3
x4 δ4
1
ξ3 x5 δ5
x6 δ6
47
1 x1 δ1
ξ1 x2 δ2
1
ξ2 x3 δ3
x4 δ4
1

ξ3 x5 δ5
x6 δ6
48
1 x1 δ1
ξ1 x2 δ2
1
ξ2 x3 δ3
x4 δ4
1
ξ3 x5 δ5
x6 δ6
1
ξ4 x7 δ7
x8 δ8 49
Identifiability Rules for CFA
(2) Three-indicator rule (sufficient, not necessary)
1) at least one factor
2) at least three indicators per factor
3) one non-zero element per row of Λ
(translation: each x only is pointed at by one LV)
4) non-correlated errors (Θ is diagonal)
(translation: no double-headed arrows between the δ’s)
[Note: no condition about correlation of factors (no
restrictions on Φ).] x1 δ1
ξ1 x2 δ2
x3 δ3
50

Harvard Lecture Series Session 4 - Factor Analysis

Uploaded by

Copyright:

Available Formats

Harvard Lecture Series Session 4 - Factor Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Harvard Lecture Series Session 4 - Factor Analysis

Uploaded by

Copyright:

Available Formats

Factor Analysis

Continuous Factor Discrete FA

 What do we need factor analysis for?

 What are the modeling assumptions?

 How to specify, fit, and interpret factor models?

 What is the difference between exploratory and

 What is and how to assess model identifiability?

2. Understanding the structure underlying a set of

Y3 3  Communality of Yi: hi2

F1 and F2 are common factors because they are shared by ≥2 variables !

Y1   11   1m  1 

 MLE if preferred with

 Both MLE and LS may have convergence problems

To do this we “rotate” factors:

Factor 1 Factor 2 Factor 1 Factor 2

 Uses “ambiguity” or non-uniqueness of solution

 Where does ambiguity come in?

 Orthogonal: Factors are independent

 Oblique: Factors are NOT independent. Change

 Also need to report correlation between the factors

 The pattern matrix presents the usual factor loadings

 The structure matrix presents correlations between the

 For orthogonal factors, pattern matrix=structure matrix

 The pattern matrix is used to interpret the factors

 Choice is generally not critical

 The factor analysis solution is NOT unique!

 These weights are NOT the factor loadings!

To obtain polychoric correlation in STATA:

 Latent variables (LVs) need some

You might also like