MVLR 07 Ttdam Factor Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 86

Online Short Term Training Programme on

Tools for Transportation Data Analysis & Modelling


27th May to 1st June, 2021

Factor Analysis

M.V.L.R. Anjaneyulu
[email protected]
9447282115

Centre for Transportation Research (CTR)


Department of Civil Engineering
National Institute of Technology Calicut
Outline

1) Overview

2) Basic Concept

3) Factor Analysis Model

4) Statistics Associated with Factor Analysis

5) Conducting Factor Analysis

CTR Tools for Transportation Data Analysis & Modelling 2


Introduction
• Factor Analysis is a method for modeling observed variables, and their covariance
structure, in terms of a smaller number of underlying unobservable (latent) “factors.”
• The key concept of factor analysis is that multiple observed variables have similar patterns
of responses because they are all associated with a latent (i.e. not directly measured)
variable
• Its primary purpose is to define the underlying structure among the variables in the analysis
• Factor analysis is an interdependence technique, no distinction between dependent and
independent variables
• Factor analysis is a class of procedures used for data reduction and summarization.
• These groups of variables (factors), which are by definition highly intercorrelated, are
assumed to represent dimensions within the data.

CTR Tools for Transportation Data Analysis & Modelling 3


Introduction
• Factor analysis applied to a set of observed variables seeks to find underlying factors from
which the observed variables were generated

• Factors are not observable and therefore disqualifies regression for analysis

• If we have a conceptual basis for understanding the relationships between variables, then
the dimensions may actually have meaning for what they collectively represent.

• Method for investigating whether a number of variables of interest Y1, Y2, : : :, Yl, are
linearly related to a smaller number of unobservable factors F1, F2, : : :, Fk

CTR Tools for Transportation Data Analysis & Modelling 4


Introduction

• Income • Analytical transportation • Price level


• Education planning • Store personnel
• Occupation • EIA • Return policy
• House value • Pavement management • Product availability
• Number of public parks • Product quality
• Number of violent • Analytical ability • Assortment depth
crimes per year • Verbal ability • Assortment width
• In store service
• Individual socioeconomic • Store atmosphere
status
• Neighbourhood • In store experience
socioeconomic status • Product offerings
• Value

CTR Tools for Transportation Data Analysis & Modelling 5


Introduction

1. 100-m run 1. Household size 1. Gender


2. Long jump 2. No. of working persons 2. Age
3. Shot put 3. No. of school going children 3. Occupation
4. High jump 4. No. of college going children 4. Marital status
5. 400-m run 5. No. retired persons 5. Income
6. 100m hurdles 6. Vehicle availability
7. Discus 1. Purpose of trip
7. Driving license
8. Pole vault 2. Mode of travel
9. Javelin 3. Travel cost
10. 1500-m run 4. Travel time
5. Joint activity
6. Shared ride

CTR Tools for Transportation Data Analysis & Modelling 6


Example 1
• Students of a PG program take three courses; Analytical Transportation Planning, EIA of
Transportation Projects and Pavement Management
• Let Y1, Y2, and Y3, respectively, represent a student's grade points (scores) in these courses
• Grades of 5 students (in a 10-point numerical scale) are shown below
Student ATP, Y1 EIA, Y2 PM, Y3 • These grades might be functions of two underlying factors, F1 and
1 3 6 5 F2, and rather loosely described as ‘quantitative ability’ and ‘verbal
2 7 3 3 ability’
3 10 9 8 • It is assumed that each Y variable is linearly related to the two
4 3 9 7 factors, as follows
5 10 6 5 • Error terms e1, e2, and e3, serve to indicate that the hypothesized
relationships are not exact.
• In the special vocabulary of factor analysis, the parameters βijs are
referred to as loadings.
• For example, β12 is called the loading of variable Y1 on factor F2

CTR Tools for Transportation Data Analysis & Modelling 7


Example 2
V1 V2 V3 V4 V5 V6 V7 V8 V9
V1 Price level 1.000
V2 Store personnel 0.427 1.000
V3 Return policy 0.302 0.771 1.00
V4 Product availability 0.470 0.497 0.427 1.000
V5 Product quality 0.765 0.406 0.307 0.427 1.000
V6 Assortment depth 0.281 0.445 0.423 0.713 0.325 1.000
V7 Assortment width 0.345 0.490 0.471 0.719 0.378 0.724 1.000
V8 In store service 0.242 0.719 0.733 0.428 0.240 0.311 0.435 1.000
V9 Store atmosphere 0.372 0.737 0.774 0.479 0.326 0.429 0.466 0.710 1.00

CTR Tools for Transportation Data Analysis & Modelling 8


Example 2
Variables grouped together by factor analysis
V3 V8 V9 V2 V6 V7 V4 V1 V5
V3 Return policy 1.000
V8 In store service 0.773 1.000
In store experience
V9 Store atmosphere 0.771 0.710 1.00
V2 Store personnel 0.771 0.719 0.737 1.000
V6 Assortment depth 0.423 0.311 0.429 0.445 1.000
Product offerings
V7 Assortment width 0.471 0.435 0.466 0.490 0.724 1.000
V4 Product availability 0.427 0.428 0.479 0.497 0.713 0.719 1.000
V1 Price level 0.302 0.242 0.372 0.427 0.281 0.354 0.470 1.000
Product offerings
V5 Product quality 0.307 0.240 0.326 0.406 0.325 0.378 0.427 0.765 1.00

CTR Tools for Transportation Data Analysis & Modelling 9


Example 3
• In a consumer preference study, a random sample of customers were asked to rate several
attributes of a new product. The responses, on a seven point semantic scale, were tabulated and the
attribute correlation matrix is as given below.
Attribute 1 2 3 4 5 Variable Factor Factor
1 Taste 1.00 0.02 0.96 0.42 0.01 loadings loadings
Factor 1 Factor 2
2 Good buy for money 0.02 1.00 0.13 0.71 0.85
Taste 0.56 0.82
3 Flavour 0.96 0.13 1.00 0.50 0.11
Good buy for money 0.78 -0.53
4 Suitable for snack 0.42 0.71 0.50 1.00 0.79
Flavour 0.65 0.75
5 Provides lots of energy 0.01 0.85 0.11 0.79 1.00
Suitable for snack 0.94 -0.10
Attribute 4 5 2 1 3
Provides lots of energy 0.80 -0.54
4 Suitable for snack 1.00
5 Provides lots of energy 0.79 1.00
2 Good buy for money 0.71 0.85 1.00
1 Taste 0.42 0.01 0.02 1.00
3 Flavour 0.50 0.11 0.13 0.96 1.00
CTR Tools for Transportation Data Analysis & Modelling 10
Example 4

1. 100-m run
2. Long jump
a) General athletic ability
3. Shot put
4. High jump b) Strength
5. 400-m run
6. 100m hurdles c) Running endurance
7. Discus
8. Pole vault d) Leg strength
9. Javelin
10. 1500-m run

CTR Tools for Transportation Data Analysis & Modelling 11


Example

CTR Tools for Transportation Data Analysis & Modelling 12


Exploratory & Confirmatory Approaches
Exploratory approach
• It is useful in searching for structure among a set of variables or as a data reduction method.
• In this perspective, exploratory factor analytic techniques “take what the data give you” and do not
set any a priori constraints on the estimation of components or the number of components to be
extracted.
Confirmatory approach
• The researcher has preconceived thoughts on the actual structure of the data, based on theoretical
support or prior research.
• The researcher may wish to test hypotheses involving issues such as which variables should be
grouped together on a factor (i.e., multiple items were specifically collected to represent store
image) or the precise number of factors

CTR Tools for Transportation Data Analysis & Modelling 13


Factor Analysis Decision Process
Stage 1: Objectives of Factor Stage 2: Designing a Factor Stage 4: Deriving Factors and
Analysis Analysis Assessing
• Identifying Structure Through • Correlations Among Variables Overall Fit
Data Summarization or Respondents • Common Factor Analysis Versus
• Data Reduction • Variable Selection and Component Analysis
• Using Factor Analysis With Measurement Issues • Criteria for the Number of
Factors to Be Extracted
Other Multivariate • Sample Size
Techniques – Latent Root Criterion
– Percentage of Variance
• Variable Selection Stage 3: Assumptions in Factor Criterion
Analysis – Scree Test Criterion
– Heterogeneity of the
Respondents
– Summary of Factor Selection
Criteria

CTR Tools for Transportation Data Analysis & Modelling 14


Factor Analysis Decision Process
Stage 5: Interpreting the Factors Stage 5: Interpreting the Factors Stage 6: Validation of Factor
• Unrotated factors (Cont.) Analysis
• Rotation of Factors • Interpreting a Factor Matrix Stage 7: Additional Uses of the
– Examine the Factor Matrix of Factor Analysis Results
– Orthogonal Rotation
Loadings – Selecting Surrogate
Methods
Variables for Subsequent
• QUARTIMAX • Identify the Highest Analysis
• VARIMAX Loading For Each
– Creating Summated Scales
• EQUIMAX Variable
• Conceptual Definition
– Oblique Rotation Methods • Assess Communalities of
• Dimensionality
the Variables
• Criteria for the Significance of • Reliability
Factor Loadings • Label the Factors
• Validity
– Ensuring Practical • Summary
Significance – Computing Factor Scores
– Assessing Statistical – Selecting Among the Three
Significance Methods
– Adjustments Based on the
Number of Variables
CTR Tools for Transportation Data Analysis & Modelling 15
Objectives of Factor Analysis
• The general purpose of exploratory factor analytic techniques is to find a way to condense
(summarize) the information contained in a number of original variables into a smaller set of new,
composite dimensions or variates (factors) with a minimum loss of information
• In meeting its objectives, exploratory factor analysis focuses on four issues:
– Specifying the unit of analysis,
– achieving data summarization and/or data reduction,
– variable selection, and
– using exploratory factor analysis results with other multivariate techniques

CTR Tools for Transportation Data Analysis & Modelling 16


Specifying the unit of analysis
• Exploratory factor analysis is actually a more general model
• It can identify the structure of relationships among either variables or cases (e.g., respondents) by
examining either the correlations between the variables or the correlations between the cases.
• R factor analysis - If the objective of the research is to summarize the characteristics (i.e.,
variables), factor analysis would be applied to a correlation matrix of the variables.
• It analyzes a set of variables to identify the dimensions for the variables that are latent (not easily
observed).
• Q factor analysis - Exploratory factor analysis also can be applied to a correlation matrix of the
individual cases based on their characteristics.
• This method combines or condenses large numbers of cases into distinctly different groups within a
larger population.
• The Q factor analysis approach is not utilized frequently because of computational difficulties.

CTR Tools for Transportation Data Analysis & Modelling 17


Achieving data summarization and/or data reduction

• Exploratory factor analysis provides two distinct, but complementary, outcomes:


– data summarization and
– data reduction.
• In summarizing the data, exploratory factor analysis derives underlying dimensions that describe the
data in a much smaller number of concepts than the original individual variables.
• The researcher may extend this process by modifying the original dimensions (factors) to obtain
more interpretable and better understood factors.
• Data reduction extends this process by deriving an empirical value (factor or summated scale score)
for each dimension (factor) and then substituting this value for the original values.

CTR Tools for Transportation Data Analysis & Modelling 18


Data Summarization
• The fundamental concept involved in data summarization is the definition of structure.
• Through structure, the researcher can view the set of variables at various levels of generalization,
ranging from the most detailed level (individual variables themselves) to the more generalized level,
where individual variables are grouped and then viewed not for what they represent individually,
but for what they represent collectively in expressing a concept.
• Exploratory factor analysis, as an interdependence technique, differs from the dependence
techniques, where one or more variables are explicitly considered the criterion or dependent
variable(s) and all others are the predictor or independent variables.
• In exploratory factor analysis, all variables are simultaneously considered with no distinction as to
dependent or independent variables.

CTR Tools for Transportation Data Analysis & Modelling 19


Data Summarization
• Factor analysis still employs the concept of the variate, the linear composite of variables, but in
exploratory factor analysis, the variates (factors) are formed to maximize their explanation of the
entire variable set, not to predict a dependent variable(s).
• The goal of data summarization is achieved by defining a small number of factors that adequately
represent the original set of variables.
• Each of the observed (original) variables is a dependent variable that is a function of some
underlying and latent set of factors (dimensions) that are themselves made up of all other
variables.
• Thus, each variable is predicted by all of the factors, and indirectly by all the other variables.
• Conversely, one can look at each factor (variate) as a dependent variable that is a function of the
entire set of observed variables.
• Structure is defined by the interrelatedness among variables allowing for the specification of a
smaller number of dimensions (factors) representing the original set of variables

CTR Tools for Transportation Data Analysis & Modelling 20


Data summarization without interpretation
• In the most basic form, exploratory factor analysis is based solely on the inter correlations among
variables, with no regard as to whether they represent interpretable dimensions or not.
• Methods such as principal components regression employ this procedure strictly to reduce the
number of variables in the analysis with no specific regard as to what these dimensions represent.
• A large number of variables represented by a much smaller set of dimensions.
• In doing so, exploratory factor analysis provides a much more parsimonious set of variables for the
analysis which can aid in model development and estimation.

CTR Tools for Transportation Data Analysis & Modelling 21


Data summarization with interpretation
• Researchers may wish to interpret and label the dimensions for managerial purposes or even utilize
the procedure to assist in scale development.
• Scale development is a specific process focused on the identification of a set of items that
represent a construct (e.g., store image) in a quantifiable and objective manner.
• Researchers can examine each dimension and “fine tune” the variables included on that dimension
to be more interpretable and useful in a managerial setting.
• This provides a straightforward method of achieving the benefits of data summarization while also
creating factors for data reduction (see next section) which have substantive managerial impact
• The choice of data summarization with or without interpretation will have an impact on the factor
extraction method applied—principal components analysis or common factor analysis

CTR Tools for Transportation Data Analysis & Modelling 22


Data Reduction
• Researchers generally also use data reduction techniques to
– (1) identify representative variables from the much larger set of variables represented by each
factor for use in subsequent multivariate analyses, or
– (2) create an entirely new set of variables, representing composites of the variables represented
by each factor, to replace the original set of variables.
• In both instances, the purpose is to retain the nature and character of the original variables, but
reduce the number of actual values included in the analysis (i.e., one per factor) to simplify the
subsequent multivariate analysis.
• Exploratory factor analysis provides the empirical basis for assessing the structure of variables and
the potential for creating these composite measures or selecting a subset of representative
variables for further analysis.

CTR Tools for Transportation Data Analysis & Modelling 23


Data Reduction
• Data summarization makes the identification of the underlying dimensions or factors as the ends in
themselves.
• Thus, estimates of the factors and the contributions of each variable to the factors (termed loadings)
provide the basis for finalizing the content of each factor. Then, any of the data reduction
approaches can be used with the results of the data summarization process.

CTR Tools for Transportation Data Analysis & Modelling 24


Variable selection
• Whether exploratory factor analysis is used for data reduction and/or summarization, the researcher
should always consider the conceptual underpinnings of the variables and use judgment as to the
appropriateness of the variables for factor analysis.
Variable Specification
• In both uses of exploratory factor analysis, the researcher implicitly specifies the potential
dimensions that can be identified through the character and nature of the variables submitted to
factor analysis.
• For example, in assessing the dimensions of store image, if no questions on store personnel were
included, factor analysis would not be able to identify this dimension.

CTR Tools for Transportation Data Analysis & Modelling 25


Factors are always produced
• Exploratory factor analysis will always produce factors.
• Exploratory factor analysis is always a potential candidate for the “garbage in, garbage out”
phenomenon.
• If the researcher indiscriminately includes a large number of variables and hopes that factor
analysis will “figure it out,” then the possibility of poor results is high.
• The quality and meaning of the derived factors reflect the conceptual underpinnings of the
variables included in the analysis.
• Thus, researchers should use judgment on which set of variables to analyze together and
understand that the researcher dictates the structure being examined.
• For example, a set of variables representing attitudes and opinions would in most instances not be
analyzed with another set of variables representing actual behaviors as these are distinctly
different sets of variables.

CTR Tools for Transportation Data Analysis & Modelling 26


Factors require multiple variables
• The researcher has ultimate control on which variables are subjected to factor analysis.
• In some cases, there may only be single variable that measures a concept that is available and thus
no need to place that variable in a factor analysis (e.g., probably only a single measure of employee
turnover—Yes or No).
• So these individual variables should not be put into the exploratory factor analysis, but instead
“mixed” with the composite scores from data reduction as needed.
• Obviously, the use of exploratory factor analysis as a data summarization technique is based on
some conceptual basis for the variables analyzed.
• But even if used primarily for data reduction with no interpretation in the data summarization
process, factor analysis is most efficient when conceptually defined dimensions can be represented
by the derived factors.

CTR Tools for Transportation Data Analysis & Modelling 27


Stage 2: Designing an exploratory Factor Analysis

• The design of an exploratory factor analysis involves three basic decisions:


(1) design of the study in terms of the number of variables, measurement properties of variables,
and the types of allowable variables;
(2) the sample size necessary, both in absolute terms and as a function of the number of variables
in the analysis; and
(3) calculation of the input data (a correlation matrix) to meet the specified objectives of grouping
variables or respondents
Variable Selection and Measurement Issues
• Two specific questions must be answered at this point:
(1) What types of variables can be used in factor analysis? and
(2) How many variables should be included?

CTR Tools for Transportation Data Analysis & Modelling 28


Variable Selection and Measurement Issues
• Types of variables included
• The primary requirement is that a correlation value can be calculated among all variables.
• Metric variables are easily measured by several types of correlations.
• Nonmetric variables, however, are more problematic because they cannot use the same types of
correlation measures used by metric variables.
• Some specialized methods calculate correlations among nonmetric variables, but the most prudent
approach is to avoid nonmetric variables.
• If a nonmetric variable must be included, use dummy variables (coded 0–1) to represent categories
of nonmetric variables.
• One drawback of this approach, however, is that there is no way for the program to ensure that all of
the dummy variables created for a single multi-category variable are represented in a single factor. If
all the variables are dummy variables, then specialized forms of factor analysis, such as Boolean
factor analysis, are more appropriate.

CTR Tools for Transportation Data Analysis & Modelling 29


Variable Selection and Measurement Issues
• Number of variables to include
• Include a reasonable number of variables per factor.
• If a study is being designed to assess a proposed structure (i.e., scale development), be sure to
include several variables (five or more) that may represent each proposed factor.
• The strength of exploratory factor analysis lies in finding patterns among groups of variables, and it
is of little use in identifying factors composed of only a single variable.
• If possible, identify several key variables (sometimes referred to as key indicants or marker
variables) that closely reflect the hypothesized underlying factors.
• This identification will aid in validating the derived factors and assessing whether the results have
practical significance.

CTR Tools for Transportation Data Analysis & Modelling 30


Sample Size
• There are guidelines based on
(1) the absolute size of the dataset,
(2) the ratio of cases to variables and
(3) the “strength” of the factor analysis results.
• In terms of absolute size, researchers generally would not factor analyze a sample of fewer than 50
observations, and preferably the sample size should be 100 or larger.
• Researchers have suggested much larger samples (200 and larger) as the number of variables and
expected number of factors increases.
• In terms of the ratio of observations to variables, the general rule is to have a minimum of five times
as many observations as the number of variables to be analyzed, and a more acceptable sample size
would have a 10:1 ratio.
• Some researchers even propose a minimum of 20 cases for each variable.

CTR Tools for Transportation Data Analysis & Modelling 31


Sample Size
• If the objective of the exploratory factor analysis is to assess preliminary structural patterns, such as
in a pilot test for a questionnaire, these ratios can be adjusted downward.
• Guidelines on sample size are based on the “strength” of the exploratory factor analysis results.
• One measure of how well a variable is accounted for in the retained factors is communality—the
amount of a variable’s variance explained by its loadings on the factors.
• The communality is calculated as the sum of the squared loadings across the factors.
• Communalities of 0.70, for example, can only occur with at least one fairly high loading (e.g., the
square root of 0.70 is 0.83), so high communalities generally denote factor results that exhibit a
series of high loadings for each variable.

CTR Tools for Transportation Data Analysis & Modelling 32


Sample Size
• Fabrigar and Wegener guidelines on sample size :
(a) a sample size of 100 is sufficient if all the communalities are 0.70 or above and there are at
least three variables with high loadings on each factor;
(b) as the communalities fall to the range of 0.40 to 0.70 then the sample size should be at least
200; and
(c) if the communalities are below 0.40 and there are few high loadings per factor, sample sizes of
up to 400 are appropriate.
• In each of these guidelines we can see that the necessary sample size increases as the complexity
of the factor analysis increases.
• For example, 30 variables requires computing 435 correlations in the factor analysis.
• At a 0.05 significance level, perhaps even 20 of those correlations would be deemed significant and
appear in the factor analysis just by chance. Thus, the researcher should always try to obtain the
highest cases-per-variable ratio to minimize the chances of overfitting the data (i.e., deriving factors
that are sample-specific with little generalizability).

CTR Tools for Transportation Data Analysis & Modelling 33


3: Assumptions in exploratory Factor Analysis
• The critical assumptions underlying exploratory factor analysis are more conceptual than statistical
• In exploratory factor analysis the overriding concerns center as much on the character and
composition of the variables included in the analysis as on their statistical qualities.
Conceptual Issues
• The conceptual assumptions underlying factor analysis relate to the set of variables selected and the
sample chosen.
• A basic assumption of factor analysis is that some underlying structure does exist in the set of
selected variables.
• The presence of correlated variables and the subsequent definition of factors do not guarantee
relevance, even if they meet the statistical requirements.
• It is the responsibility of the researcher to ensure that the observed patterns are conceptually valid
and appropriate to study with exploratory factor analysis, because the technique has no means of
determining appropriateness other than the correlations among variables.

CTR Tools for Transportation Data Analysis & Modelling 34


Conceptual Issues
• For example, mixing dependent and independent variables in a single factor analysis and then using
the derived factors to support dependence relationships is not appropriate.
• The researcher must also ensure that the sample is homogeneous with respect to the underlying
factor structure.
• For example, it is inappropriate to apply exploratory factor analysis to a sample of males and
females for a set of items known to differ because of gender.
• When the two subsamples (males and females) are combined, the resulting correlations and factor
structure will be a poor representation (and likely incorrect) of the unique structure of each group.
• Thus, whenever groups are included in the sample that are expected to have different items
measuring the same concepts, separate factor analyses should be performed, and the results
should be compared to identify differences not reflected in the results of the combined sample.

CTR Tools for Transportation Data Analysis & Modelling 35


Statistical Issues
• From a statistical standpoint, departures from normality, homoscedasticity, and linearity apply only
to the extent that they diminish the observed correlations.
• Only normality is necessary if a statistical test is applied to the significance of the factors, but these
tests are rarely used.
• Multicollinearity is desirable, because the objective is to identify interrelated sets of variables.
• Degree of interrelatedness is assessed from the perspectives of both the overall correlation matrix
and individual variables.

CTR Tools for Transportation Data Analysis & Modelling 36


Appropriateness of factor analysis
• Overall Measures of Intercorrelation
• Researcher must also ensure that the data matrix has sufficient correlations to justify the
application of exploratory factor analysis.
• If it is found that all of the correlations are low, or that all of the correlations are equal (denoting
that no structure exists to group variables), then the application of exploratory factor analysis is
questionable.
Visual Inspection
• If visual inspection reveals a small number of correlations among the variables greater than 0.30,
then exploratory factor analysis is probably inappropriate.
• The correlations among variables can also be analyzed by computing the partial correlations among
variables.
• A partial correlation is the correlation that is unexplained when the effects of other variables are
taken into account.

CTR Tools for Transportation Data Analysis & Modelling 37


Appropriateness of factor analysis
Bartlett Test • Measure Of Sampling Adequacy
• Examines the entire correlation matrix. • This index ranges from 0 to 1, reaching 1
• The Bartlett test of sphericity is a when each variable is perfectly predicted
statistical test for the presence of without error by the other variables.
correlations among the variables. o 0.80 or above - meritorious;
• It provides the statistical significance o 0.70 or above - middling;
indicating the correlation matrix has o 0.60 or above - mediocre;
significant correlations among at least o 0.50 or above - miserable; and
some of the variables. o below 0.50 - unacceptable
• Increasing the sample size causes the
Bartlett test to become more sensitive in
detecting correlations among the
variables.

CTR Tools for Transportation Data Analysis & Modelling 38


Appropriateness of factor analysis
• The MSA increases as
(1) the sample size increases,
(2) the average correlations increase,
(3) the number of variables increases, or
(4) the number of factors decreases [35].
• Overall MSA value should be above 0.50 before proceeding with the factor analysis.
• If the MSA value falls below 0.50, then the variable-specific MSA values can identify variables for
deletion to achieve an overall value of 0.50.

CTR Tools for Transportation Data Analysis & Modelling 39


4: Deriving Factors and Assessing overall Fit
• Once the variables are specified and the correlation matrix is prepared, the researcher is ready to
apply exploratory factor analysis to identify the underlying structure of relationships.
• Decisions must be made concerning
(1) the method of extracting the factors (common factor analysis versus principal components
analysis) and
(2) the number of factors selected to represent the underlying structure in the data.

CTR Tools for Transportation Data Analysis & Modelling 40


4: Deriving Factors and Assessing overall Fit
• Partitioning the Variance of a Variable
• Variance is a value (i.e., the square of the standard deviation) that represents the total amount of
dispersion of values for a single variable about its mean.
• When a variable is correlated with another variable, we often say it shares variance with the other
variable, and the amount of sharing between just two variables is simply the squared correlation.
• For example, if two variables have a correlation of 0.50, each variable shares 25 percent (0.502) of its
variance with the other variable.
• In exploratory factor analysis, variables are grouped by their correlations, such that variables in a
group (factor) have high correlations with each other.
• Thus, it is important to understand how much of a variable’s variance is shared with other variables
in that factor versus what cannot be shared (e.g., unexplained).

CTR Tools for Transportation Data Analysis & Modelling 41


4: Deriving Factors and Assessing overall Fit
• The total variance of any variable can be divided (partitioned) into three types of variance:
Common Versus Unique Variance
• Common variance is that variance in a variable that is shared with all other variables in the analysis.
This variance is accounted for (shared) based on the variable’s correlations with all other variables in
the analysis.
• A variable’s communality is the estimate of its shared, or common, variance among the variables as
represented by the derived factors.
• Unique variance is that variance associated with only a specific variable and is not represented in
the correlations among variables.
• Variables with high common variance are more amenable to exploratory factor analysis since they
correlate more with the other variables in the analysis.
• The amount of common variance is what is measured in the MSA values and provides an objective
manner in which to assess the degree of common variance.

CTR Tools for Transportation Data Analysis & Modelling 42


4: Deriving Factors and Assessing overall Fit
• Unique Variance Composed Of Specific And Error Variance
• Unique variance is useful to try and understand its two potential sources; specific and error, so as to
understand ways in which the common variance may be increased or the error variance mitigated in
some fashion.
• Specific variance cannot be explained by the correlations to the other variables but is still associated
uniquely with a single variable.
• It reflects the unique characteristics of that variable apart from the other variables in the analysis.
• Error variance is also variance that cannot be explained by correlations with other variables, but it is
due to unreliability in the data-gathering process, measurement error, or a random component in the
measured phenomenon.
• While making a precise estimate of the breakdown of unique variance into specific and error variance is
not required for the analysis, understanding the sources of unique variance is important for assessing a
variable in the factor results, particularly in the scale development process.

CTR Tools for Transportation Data Analysis & Modelling 43


4: Deriving Factors and Assessing overall Fit
• The total variance of any variable is has two basic sources—common and unique, with unique
variance having two sub-parts—specific and error.
• As a variable is more highly correlated with one or more variables, the common variance
(communality) increases.
• However, if unreliable measures or other sources of extraneous error variance are introduced, then
the amount of possible common variance and the ability to relate the variable to any other variable
are reduced.

CTR Tools for Transportation Data Analysis & Modelling 44


Common Factor Analysis Versus Principal Component Analysis
• The selection of one method over the other is based on two criteria:
(1) the objectives of the factor analysis and
(2) the amount of prior knowledge about the variance in the variables.
• Principal component analysis is used when the objective is to summarize most of the original
information (variance) in a minimum number of factors for prediction purposes.
• Common factor analysis is used primarily to identify underlying factors or dimensions that reflect
what the variables share in common.
Principal Components Analysis
• Considers the total variance and derives factors that contain small proportions of unique variance
and, in some instances, error variance.

CTR Tools for Transportation Data Analysis & Modelling 45


Common Factor Analysis
• Common factor analysis considers only the common or shared variance, assuming that both the
unique and error variance are not of interest in defining the structure of the variables.
• To employ only common variance in the estimation of the factors, communalities (instead of unities)
are inserted in the diagonal.
• Thus, factors resulting from common factor analysis are based only on the common variance.
• Common factor analysis excludes a portion of the variance included in a principal component
analysis.

CTR Tools for Transportation Data Analysis & Modelling 46


• Principal component analysis is most appropriate when:
– data reduction is a primary concern
– prior knowledge suggests that specific and error variance represent a relatively small proportion
of the total variance, or
– the principal component results are used as a preliminary step in the scale development process

• Common factor analysis is most appropriate when:


– the primary objective is to identify the latent dimensions or constructs represented in the
common variance of the original variables, as typified in the scale development process, and
– the researcher has little knowledge about the amount of specific and error variance and
therefore wishes to eliminate this variance.

• In most applications, both principal component analysis and common factor analysis arrive at
essentially identical results if the number of variables exceeds 30 or the communalities exceed 0.60
for most variables.

CTR Tools for Transportation Data Analysis & Modelling 47


Criteria for the number of factors to extract
• Both factor analysis methods are interested in the best linear combination of variable
• Best in the sense that the particular combination of original variables accounts for more of the
variance in the data as a whole than any other linear combination of variables.
• Therefore, the first factor may be viewed as the single best summary of linear relationships
exhibited in the data.
• The second factor is defined as the second-best linear combination of the variables, subject to the
constraint that it is orthogonal to the first factor.
• To be orthogonal to the first factor, the second factor must be derived only from the variance
remaining after the first factor has been extracted.
• Thus, the second factor may be defined as the linear combination of variables that accounts for the
most variance that is still unexplained after the effect of the first factor has been removed from the
data.

CTR Tools for Transportation Data Analysis & Modelling 48


Criteria for the number of factors to extract
• The process continues extracting factors accounting for smaller and smaller amounts of variance
until all of the variance is explained.
• For example, the components method actually extracts n factors, where n is the number of variables
in the analysis. Thus, if 30 variables are in the analysis, 30 factors are extracted.
• Of 30 store image variables where 30 factors are extracted, the first factors will hopefully account for
a substantial enough portion of the variance so that the researcher can retain only a small number of
factors to adequately represent the variance of the entire set of variables.
• How many factors to extract or retain?
• Based on a conceptual foundation (How many factors should be in the structure?) with some
empirical evidence (How many factors can be reasonably supported?).

CTR Tools for Transportation Data Analysis & Modelling 49


Criteria for the number of factors to extract
• Stopping rules for the number of factors to extract
• A Priori Criterion
• A priori criterion is a simple yet reasonable criterion under certain circumstances.
• The researcher already knows how many factors to extract before undertaking the factor analysis.
The researcher simply instructs the computer to stop the analysis when the desired number of
factors has been extracted.
• This approach is useful when testing a theory or hypothesis about the number of factors to be
extracted.
• It also can be justified in attempting to replicate another researcher’s work and extract the same
number of factors that was previously found.

CTR Tools for Transportation Data Analysis & Modelling 50


Criteria for the number of factors to extract
Latent Root Criterion
• The most commonly used technique, also known as the Kaiser rule.
• This technique is simple to apply,
• The rationale being that any individual factor should account for the variance of at least a single
variable if it is to be retained for interpretation.
• With principal component analysis each variable by itself contributes a value of 1 (i.e., the value on
the diagonal of the correlation matrix) to the total eigenvalue for all variables.
• The simple rule is: Don’t retain any factors which account for less variance than a single variable.
• Thus, only the factors having latent roots or eigenvalues greater than 1 are considered significant;
• All factors with latent roots less than 1 are considered insignificant and are disregarded.

CTR Tools for Transportation Data Analysis & Modelling 51


Criteria for the number of factors to extract
• This rule is most applicable to principal components analysis where the diagonal value representing
the amount of variance for each variable is 1.0.
• In common factor analysis the diagonal value is replaced with the communality (amount of variance
explained) of the variable.
• In most instances this is a value less than 1.0, so using the latent root criterion on this form of the
correlation matrix would be less appropriate.
• So in common factor analysis many times the latent root criterion is applied to the factor results
before the diagonal value is replaced by the communality.
• Another approach is to extract factors with an eigenvalue less than one, with the level being chosen
as approximately the average of the communalities of the items.

CTR Tools for Transportation Data Analysis & Modelling 52


Criteria for the number of factors to extract
• The latent root criterion is most reliable when the number of variables is between 20 and 50 and
communalities above 0.40.
• If the number of variables is less than 20, the tendency for this method is to extract a conservative
number of factors (too few).
• In contrast, if more than 50 variables are involved, it is not uncommon for too many factors to be
extracted
• In most instances, the latent root criterion is applied as a first step, and then other criteria are
considered in combination with this initial criterion

CTR Tools for Transportation Data Analysis & Modelling 53


Criteria for the number of factors to extract
Percentage of Variance Criterion
• The percentage of variance criterion is an approach based on achieving a specified cumulative
percentage of total variance extracted by successive factors.
• The purpose is to ensure practical significance for the derived factors by ensuring that they explain
at least a specified amount of variance.
• No absolute threshold has been adopted for all applications.
• In the natural sciences, the factoring procedure usually should not be stopped until the extracted
factors account for at least 95 percent of the variance, or until the last factor accounts for only a
small portion (less than 5%).
• In the social sciences, where information is often less precise, it is not uncommon to consider a
solution that accounts for 60 percent of the total variance as satisfactory, and in some instances
even less.

CTR Tools for Transportation Data Analysis & Modelling 54


Criteria for the number of factors to extract
Scree Test Criterion
• Recall that with the component analysis factor model the later factors extracted contain both
common and unique variance.
• Although all factors contain at least some unique variance, the proportion of unique variance is
substantially higher in later factors.
• The scree test is used to identify the optimum number of factors that can be extracted before the
amount of unique variance begins to dominate the common variance structure.
• The scree test is derived by plotting the latent roots against the number of factors in their order of
extraction, and
• the shape of the resulting curve is used to evaluate the cut-off point.

CTR Tools for Transportation Data Analysis & Modelling 55


Criteria for the number of factors to extract
• Figure shows the first 18 factors extracted in a
study.
• Starting with the first factor, the plot slopes
steeply downward initially and then slowly
becomes an approximately horizontal line—an
inflection point termed by many as the “elbow.”
• This point at which the curve first begins to
straighten out is considered to represent those
factors containing more unique rather than
common variance and thus are less suitable for
retention.
• Most researchers do not include the elbow, but
rather retain all of the preceding factors.
• So in this instance if the elbow was at 10 factors,
then 9 factors would be retained

CTR Tools for Transportation Data Analysis & Modelling 56


Interpreting the Factors
• Most applications of exploratory factor analysis involve data summarization with interpretation.
• No unequivocal processes or guidelines determine the interpretation of factors.
• The researcher with a strong conceptual foundation for the anticipated structure and its rationale
has the greatest chance of success.
• The researcher must repeatedly make subjective judgments in decisions such as to the number of
factors to extract, the sufficient number of relationships to warrant grouping variables, and how to
identify the groupings.
• It is therefore up to the researcher to be the final arbitrator as to the form and appropriateness of a
factor solution, and such decisions are best guided by conceptual rather than empirical bases.
• To assist in the process of interpreting a factor structure and selecting a final factor solution, three
fundamental processes are described.

CTR Tools for Transportation Data Analysis & Modelling 57


Interpreting the Factors
• Within each process, several substantive issues (factor rotation, factor-loading significance,
and factor interpretation) are encountered.
• Estimate the Factor Matrix
• First, the initial unrotated factor matrix is computed, containing the factor loadings for each
variable on each factor.
• Factor loadings are the correlation of each variable and the factor.
• Loadings indicate the degree of correspondence between the variable and the factor, with
higher loadings making the variable representative of the factor.
• In exploratory factor analysis, each variable has loadings on all factors.
• Factor loadings are the means of interpreting the role each variable plays in defining each
factor

CTR Tools for Transportation Data Analysis & Modelling 58


Interpreting the Factors
• Factor Rotation
• The unrotated factor solutions achieve the objective of data reduction.
• Whether the unrotated factor solution (which fulfills desirable mathematical requirements) will
provide information that offers the most adequate interpretation of the variables under
examination?
• In most instances the answer to this question is no, because factor rotation, should simplify the
factor structure (i.e., have each variable load highly on only one factor).
• In most cases rotation of the factors improves the interpretation by reducing some of the
ambiguities that often accompany initial unrotated factor solutions.

CTR Tools for Transportation Data Analysis & Modelling 59


Interpreting the Factors
• Factor Interpretation and Respecification
• As a final process, evaluate the (rotated) factor loadings for each variable in order to determine that
variable’s role and contribution in determining the factor structure.
• The need may arise to respecify the factor model owing to
(1) the deletion of a variable(s) from the analysis,
(2) the desire to employ a different rotational method for interpretation,
(3) the need to extract a different number of factors, or
(4) the desire to change from one extraction method to another.
• Respecification of a factor model involves returning to the extraction stage, extracting factors, and
then beginning the process of interpretation once again.

CTR Tools for Transportation Data Analysis & Modelling 60


Rotation Of Factors
• Perhaps the most
important tool in
interpreting factors is
factor rotation
• Specifically, the
reference axes of the
factors are turned
about the origin until
some other position
has been reached.

Orthogonal factor rotation


axes are maintained at 90 degrees

CTR Tools for Transportation Data Analysis & Modelling 61


Rotation Of Factors
• The vertical axis represents unrotated factor II, and the
horizontal axis represents unrotated factor I.
• The axes are labeled with 0 at the origin and extend
outward to -1.0 or +1.0.
• The numbers on the axes represent the factor loadings.
• The five variables are labeled V1, V2, V3, V4 and V5.
• The factor loading for a variable are obtained by drawing
horizontal/vertical lines onto axes

CTR Tools for Transportation Data Analysis & Modelling 62


Rotation Of Factors
• Oblique factor rotation
• Rotate the axes and not retain the 90-degree
angle between the reference axes.

CTR Tools for Transportation Data Analysis & Modelling 63


Rotation Of Factors
• Orthogonal Rotation Methods
• VARIMAX
• EQUIMAX
• Oblique Rotation Methods
• IBM SPSS provides OBLIMIN
• SAS has PROMAX and
• ORTHOBLIQUE.

CTR Tools for Transportation Data Analysis & Modelling 64


Significance of factor loadings
• A factor loading is the correlation of the variable and the factor
• The squared loading is the amount of the variable’s total variance accounted for by the factor.
• Thus, a 0.30 loading translates to approximately 10 percent explanation, and a 0.50 loading denotes that
25 percent of the variance is accounted for by the factor. The loading must exceed 0.70 for the factor to
account for 50 percent of the variance of a variable.
• Thus, the larger the absolute size of the factor loading, the more important the loading in interpreting
the factor matrix
• Factor loadings less than 0.10 can be considered equivalent to zero for purposes of assessing simple
structure.
• Factor loadings in the range of 0.30 to 0.40 are considered to meet the minimal level for interpretation
of structure.
• Loadings 0.50 or greater are considered practically significant.
• Loadings exceeding 0.70 are considered indicative of well-defined structure and are the goal of any
factor analysis

CTR Tools for Transportation Data Analysis & Modelling 65


Significance of factor loadings
• For example, in a sample of 100
respondents, factor loadings of 55
and above are significant.
• However, in a sample of 50, a factor
loading of 0.75 is required for
significance.
• In comparison with the prior rule of
thumb, which denoted all loadings
of 30 as having practical significance,
this approach would consider
loadings of 0.30 significant only for
sample sizes of 350 or greater.

CTR Tools for Transportation Data Analysis & Modelling 66


Factor analysis - Example

CTR Tools for Transportation Data Analysis & Modelling 67


Factor analysis - Example

0.80 or above - meritorious;


0.70 or above - middling;
0.60 or above - mediocre;
0.50 or above - miserable; and
below 0.50 - unacceptable

CTR Tools for Transportation Data Analysis & Modelling 68


CTR Tools for Transportation Data Analysis & Modelling 69
CTR Tools for Transportation Data Analysis & Modelling 70
CTR Tools for Transportation Data Analysis & Modelling 71
• Factor loadings less than
0.10 can be considered
equivalent to zero for
purposes of assessing
simple structure.
• Factor loadings in the
range of 0.30 to 0.40 are
considered to meet the
minimal level for
interpretation of
structure.
• Loadings 0.50 or greater
are considered practically
significant.
• Loadings exceeding 0.70
are considered indicative
of well-defined structure
and are the goal of any
factor analysis

CTR Tools for Transportation Data Analysis & Modelling 72


CTR Tools for Transportation Data Analysis & Modelling 73
CTR Tools for Transportation Data Analysis & Modelling 74
CTR Tools for Transportation Data Analysis & Modelling 75
CTR Tools for Transportation Data Analysis & Modelling 76
CTR Tools for Transportation Data Analysis & Modelling 77
CTR Tools for Transportation Data Analysis & Modelling 78
• general intelligence
• mathematical-ability

CTR Tools for Transportation Data Analysis & Modelling 79


• Nutritional factor
• Taste factor
CTR Tools for Transportation Data Analysis & Modelling 80
Factor Rotation

• When the first factor solution does not reveal the hypothesized structure of the loadings, it
is customary to apply rotation in an effort to find another set of loadings that fit the
observations equally well but can be more easily interpreted

• Perhaps the most widely used of these is the Varimax criterion

• It seeks the rotated loadings that maximize the variance of the squared loadings for each
factor

• The goal is to make some of these loadings as large as possible and the rest as small as
possible in absolute value

CTR Tools for Transportation Data Analysis & Modelling 81


Factor Rotation

• The Varimax method encourages the detection of factors each of which is related to few
variables

• It discourages the detection of factors influencing all variables

• Quartimax criterion, on the other hand, seeks to maximize the variance of the squared
loadings for each variable, and tends to produce factors with high loadings for all variables

CTR Tools for Transportation Data Analysis & Modelling 82


How Many Factors?
• Kaiser (1960) proposed dropping factors whose eigenvalues are less than one since these provide
less information than is provided by a single variable

• Jolliffe (1972) feels that Kaiser’s criterion is too large. He suggests using a cutoff on the eigenvalues
of 0.7 when correlation matrices are analyzed

• Cattell (1966) documented the scree plot


• Studying this chart is probably the most popular method for determining the number of factors, but
it is subjective, causing different people to analyze the same data with different results

CTR Tools for Transportation Data Analysis & Modelling 83


CTR Tools for Transportation Data Analysis & Modelling 84
CTR Tools for Transportation Data Analysis & Modelling 85
CTR
THANK YOU
Tools for Transportation Data Analysis & Modelling 86

You might also like