American Journal of Mathematics and Statistics 2020, 10(2): 44-54

DOI: 10.5923/j.ajms.20201002.03

A Comparison of Principal Component Analysis,

Maximum Likelihood and the Principal Axis in
Factor Analysis
Onyekachi Akuoma Mabel, Olanrewaju Samuel Olayemi*

Department of Statistics, University of Abuja, Abuja, FCT, Nigeria

Abstract This study aims to draw attention to the best extraction technique that may be considered when using the three of
the most popular methods for choosing the number of factors/components: Principal Component Analysis (PCA), Maximum
Likelihood Estimate (MLE) and Principal Axis Factor Analysis (PAFA), and compare their performance in terms of
reliability and accuracy. To achieve this study objective, the analysis of the three methods was subjected to various research
contexts. A Monte Carlo method was used to simulate data. It generates a number of datasets for the five statistical
distribution considered in this study: The Normal, Uniform, Exponential, Laplace and Gamma distributions. The level of
improvement in the estimates was related to the proportion of observed variables and the sum of the square loadings of the
factors/components within the dataset and across the studied distributions. Different combinations of sample size and number
of variables over the distributions were used to perform the analysis on the three analyzed methods. The generated datasets
consist of 8 and 20 variables and 20 and 500 number of observations for each variable. 8 and 20 variables were chosen to
represent small and large variables respectively. Also 20 and 500 sample sizes were chosen to represent also the small and
large sample sizes respectively. The result of analysis, from applying the procedures on the simulated data set, confirm that
PC analysis is overall most suitable, although the loadings from PCA and PAFA are rather similar and do not differ
significantly, though the principal component method yielded factors that load more heavily on the variables which the
factors hypothetically represent. Considering the above conclusions, it would be natural to recommend the use of PCA over
other extraction methods even though PAF is somehow similar to its methods.
Keywords Principal component analysis, Maximum Likelihood, Principal Axis, Factor Analysis

which several common or group factors are postulated and in

1. Introduction which the specific factors are normally absorbed into the
error term. From this standpoint Spearman’s Two-Factor
The origin of factor analysis dated back to a work done by Theory may be regarded as a one-factor theory, that is one
Spearman in 1904. At that time Psychometricians were common or group factor. For example, an individual’s
deeply involved in the attempt to suitably quantify human response to the questions on a college entrance test is
intelligence, and Spearman’s work provided a very clever influenced by underlying variables such as intelligence,
and useful tool that is still at the bases of the most advanced years in school, age, emotional state on the day of the test,
instruments for measuring intelligence. Spearman was amount of practice taking tests, and so on. The answers to the
responsible for the development of Two-Factor Theory, in questions are the observed variables. The underlying,
which each variable receives a contribution from two factors, influential variables are the factors.
a general factor common to all variables and a specific factor The factor procedure performs a variety of common factor
unique to itself. Pearson developed the methods of principal and component analyses and rotations. Input can be
axes, later to be extended by Hotelling to become the theory multivariate data, a correlation matrix, a covariance matrix, a
of principal components. Spearman’s Two-Factor Theory factor pattern, or a matrix of scoring coefficients. The
was eventually superseded by multiple factor analysis in procedure can factor either the correlation or covariance
matrix, and most results can be saved in an output data set.
* Corresponding author: Factor analysis is used in many fields such as behavioural
olanrewaju.samuel@uniabuja.edu.ng (Olanrewaju Samuel Olayemi)
and social sciences, medicine, economics, and geography as
a result of the technological advancements of computers.
American Journal of Mathematics and Statistics 2020, 10(2): 44-54

factor analysis, unweighted least-squares factor analysis, 𝜎11 ⋯ 𝜎1𝑝

maximum-likelihood (canonical) factor analysis, alpha 𝑉𝑎𝑟 𝑋 = = ⋮ ⋮ ⋮
factor analysis, image component analysis, and Harris 𝜎𝑝1 ⋯ 𝜎𝑝𝑝
component analysis. A variety of methods for prior The orthogonal factor model is
communality estimation is also available but, in this study, 𝑋𝑝𝑥 1 − µ𝑝𝑥 1 = 𝜆𝑝𝑥𝑚 𝐹𝑚𝑥 1 + ℇ𝑝𝑥 1 (2.1)
only the principal factor analysis, the maximum likelihood
Key Concepts
factor analysis and Principal Axis Factor analysis will be
considered.  F is latent (i.e. unobserved, underlying) variable
Also, there are different methods for factor rotation. The  X’s are observed (i.e. manifest) variables
methods for rotation are varimax, quartimax, parsimax,  ℇj is measurement error for Xj.
equamax, orthomax with user-specified gamma, promax  𝜆𝑗 is the “loading” for Xj.
with user-specified exponent, Harris-Kaiser case II with 𝜆11 ⋯ 𝜆1𝑚
user-specified exponent, and oblique Procrustean with a 𝜆= ⋮ ⋮ ⋮ is called the factor loading matrix
user-specified target pattern. Here also, varimax will be 𝜆𝑝1 ⋯ 𝜆𝑝𝑚
considered further in this study. 𝐹1
𝐹= ⋮ are called the factors or common factors,
1.1. Aims and Objectives 𝐹𝑚
The main aim of this study to provide researchers with ℇ1
empirical derivation guidelines for conducting factor ℇ = ⋮ are called errors or specific errors.
analytic studies in complex research contexts. To enhance ℇ𝑝
the potential utility of this study, the researchers focused on This is written in matrix notation as
factor extraction methods commonly employed by social 𝜆11 ⋯ 𝜆1𝑚 𝐹1 ℇ1

scientists; these methods include principal component = ⋮ ⋮ ⋮ ⋮ + ⋮

analysis, maximum likelihood method and principal axis 𝜆𝑝1 ⋯ 𝜆𝑝𝑚 𝐹𝑚 ℇ𝑝
factor analysis method. To meet the goal of this study, factor
Our hypothesis is that these values arise from a linear
extraction models were subjected to several research
combination of k factors, 𝐹𝑚 , plus noise, ℇ𝑗 ;
conditions. These contexts differed in sample sizes, number
of variables and distributions. Data were simulated under The model can be re-expressed as
1000 different conditions; specifically, this study employed 𝑋𝑗 − µ𝑗 = 𝑚 =1 𝜆𝑗𝑚 𝐹𝑚 + ℇ𝑗 , 𝑗 = 1, 2, … , 𝑝 (2.2)
a two (sample size) by two (number of variables) by five And 𝜆𝑗𝑚 is called the loading of 𝑋𝑗 on the factor 𝐹𝑚 .
𝜆11 ⋯ 𝜆1𝑘
The following are the specific objectives of the study:
In the factor Matrix ⋮ ⋮ ⋮ :
(i) To generate artificial sample independent from each 𝜆𝑝1 ⋯ 𝜆𝑝𝑘
of the five distribution.  Columns represent derived factors.
(ii) To know how well the hypothesized factors explain  Rows represent input variables.
the observed data.  Loadings represent degree to which each of the variable
(iii) To determine the best extraction method in factor “correlates” with each of the factors •Loadings range
analysis. from -1 to 1.
(iv) To compare the Principal Components Analysis,  Inspection of factor loadings reveals extent to which
Maximum Likelihood Factor Analysis, and Principal each of the variables contributes to the meaning of each
Axis Factor Analysis method of extractions with of the factors.
varimax rotation method.  High loadings provide meaning and interpretation of
1.2. Theoretical Framework factors (regression coefficients).

1.2.1. The Orthogonal Factor Model 1.2.2. Review of Previous Studies

The aim of factor analysis is to explain the outcome of p For this research work, different related theses, websites,
variables in the data matrix 𝒙 using fewer variables, called books, journals, articles etc. have been studied, it is
factors. Ideally all the information in 𝒙 can be reproduced discovered that John Spearman was first to find the use of
by a smaller number of factors. These factors are interpreted Factor Analysis in developing psychology and sometimes
as latent (unobserved) common characteristics of the credited with the invention of factor analysis. He discovered
observed 𝑥 ∈ ℝ𝑝 . Let 𝑥 = (𝑥1 , … , 𝑥𝑝 )𝑇 be the variables that school children study on variety of seemingly unrelated
in population with mean, subjects which are positively correlated. This led him to
𝜇1 postulate that the General Mental Ability (GMA) underlies
𝐸 𝑋 = 𝜇 = ⋮ and variance, and shapes human cognitive performance. This postulate
𝜇𝑝 now enjoys broad support in the field of intelligence research
which is known as the G-theory.
46 Onyekachi Akuoma Mabel and Olanrewaju Samuel Olayemi: A Comparison of Principal
Component Analysis, Maximum Likelihood and the Principal Axis in Factor Analysis

Raymond Cattel expanded on Spearman’s idea of a yield substantive interpretations that “are essentially the
two-factor theory of intelligence after performing his own same” ([5]).
tests on factor analysis. He used a multi-factor theory to [7] in his book, Robustness of the Maximum Likelihood
explain intelligence. Cattell’s theory addressed alternate Estimation procedure in factor analysis, generated random
factors in intellectual development, including motivation and variables from six distributions independently on a
psychology. Cattell also developed several mathematical high-speed computer and then used to represent the common
methods for adjusting psychometric graphs such as his and specific factors in a factor analysis model in which the
“scree” test and similarity coefficients. His research led to coefficients of these factors had been specified. Using
the development of his theory of fluid and crystallized Lawley's approximate χ2 statistic in evaluating the estimates
intelligence as well as his sixteen (16) personality factors obtained, the estimation procedure is found to be insensitive
theory of personality. to changes in the distributions considered.
[1] examined the information reported in 60 exploratory A Monte Carlo Study Comparing Three Methods for
factor analyses published before 1999. The authors focused Determining the Number of Principal Components and
on studies that employed at least one exploratory factor Factors by [7] was conducted. The results of the analysis
analysis strategy. Although Henson and Roberts noted that confirm the findings from previous papers that Kaiser
most of the articles reported researcher objectives that criterion has the poorest performance compared with the
warranted an EFA design, nearly 57% of the researchers other two analyzed methods. Parallel analysis is overall the
engaged in principal components analysis. As a suggested most accurate, although when the true number of factors/
rationale for this problematic model selection, the authors components is small, acceleration factor can outperform it.
noted that principal components analysis was the “default The acceleration factor and Kaiser criterion perform with
option for most statistical software packages” ([1]). different accuracy for different true number of factors/
The goal for factor retention guidelines is to identify the components and number of variables, whereas the parallel
necessary number of factors to account for the correlations analysis is only affected by the sample size. Kaiser criterion
among measured variables. Empirical research suggests tends to overestimate and acceleration factor – to
that under-factoring, retaining too few factors, is more underestimate the number of factors/ components. The
problematic than over- factoring ([2]). However, parallel analysis shows fewer fluctuations in its accuracy and
over-factoring is not ideal; for example, when over-factoring, is more robust.
researchers may postulate the existence of factors with no [8] conducted a research on the Principal component
theoretical basis which can “accentuate poor decision made procedure in factor analysis and robustness. In her study she
at other steps in factor analysis” ([2]). said that Principal component procedure has been widely
Orthogonal rotation algorithms yield uncorrelated factors used in factor analysis as a data reduction procedure. The
([2]; [3]; [4]). The most commonly employed type of estimation of the covariance and correlation matrix in factor
orthogonal rotation is varimax ([2]). Although orthogonal analysis using principal component procedure is strongly
rotation yields simple structure, the use of orthogonal influenced by outliers. The study investigated the robustness
rotation when the factors are correlated in the population of principal component procedure in factor analysis by
results in the loss of important data ([5]). generating random variables from five different distributions
In published literature, orthogonal strategies appear to be which are used to determine the common and specific factors
the most commonly cited rotation procedure. According to in factors analysis using principal component procedure. The
[6] findings, researchers reported using an orthogonal results revealed that the variance of the first factor was
rotation strategy most frequently (40%); in only 18% of the widely distributed from distribution to distribution ranging
studies, researchers reported the use of an oblique rotation from 0.6730 to 5.9352. The contribution of the first factor
strategy. More recently, in their study of common practices to the total variance varied widely from 15 to 98%. It was
in factor analytic research, [1] found that 55% of the articles therefore concluded that the principal component procedure
included orthogonal rotation strategies; researchers reported is not robust in factor analysis.
the use of oblique rotation strategies in 38.3% of the articles, In the book, Exploratory Factor Analysis by [2]
and, in 1.7% of the articles, researchers failed to report any summarizes the key issues that researchers need to take into
factor rotation method. consideration when choosing and implementing exploratory
The “conventional wisdom advises researchers to use factor analysis (EFA) before offering some conclusions and
orthogonal rotation because it produces more easily recommendations to help readers who are contemplating the
interpretable results . . .” ([5]). However, this argument is use of EFA in their own research. It reviews the basic
flawed in two areas. Firstly, social science researchers assumptions of the common factor model, the general
“generally expect some correlation among factors” ([5]); mathematical model on which EFA is based, intended to
therefore, the use of orthogonal rotation results in the loss of explain the structure of correlations among a battery of
information concerning the correlations among factors. measured variables; the issues that researchers should bear in
Secondly, output associated with oblique rotation is “only mind in determining when it is appropriate to conduct an
slightly more complex” than orthogonal rotation output and EFA; the decisions to be made in conducting an EFA; and
American Journal of Mathematics and Statistics 2020, 10(2): 44-54

the implementation of EFA as well as the interpretation of 2.3. Sample Size and Sampling Procedure
data it provides. Determining the sample size is a very important issue in
Robustness of the maximum likelihood estimation factor analysis because samples that are too large may waste
procedure in factor analysis by [9] is that of random variables time, resources and money, while samples that are too small
generated from five distributions which were used to may lead to inaccurate result. Larger samples are better than
represent the common and specific factors in factor analysis smaller samples (all other things being equal) because larger
in order to determine the robustness of the maximum samples tend to minimize the probability of errors, maximize
likelihood estimation procedure. Five response variables the accuracy of population estimates, and increase the
were chosen for this study each with two factors. The chosen generalizability of the results. Factor analysis is a technique
variables were transformed into linear combinations of an that requires a large sample size. Factor analysis is based on
underlying set of hypothesized or unobserved components the correlation matrix of the variable involved and
(factors). The result revealed that the estimates of the correlations usually need a large sample size before they
variance for the first factor were found to be almost the same stabilize.
and closely related to each other in all the distributions [12] suggested that “the adequacy of sample size might be
considered. The Chi-Square test conducted concluded that evaluated very roughly on the following scale: 50 – very
maximum likelihood method of estimation is robust in factor poor; 100 – poor; 200 – fair; 300 – good; 500 – very good;
analysis. 1000 or more – excellent” (p. 217). It is known that a sample
[10] wrote on a robust method of estimating covariance size of 200 or more is sufficient for a sample number of
matrix in multivariate data analysis. This is also a research independent variables. As the sample size gets larger for
paper where a proposed robust method of estimating Maximum Likelihood Estimator, the estimates become
covariance matrix in multivariate data set was done. The goal consistent, efficient and unbiased. So in our experiment, we
was to compare the proposed method with the most widely will consider two types of sample sizes – 20 (small) and 500
used robust methods (Minimum Volume Ellipsoid and (large) sample number of independent variables.
Minimum Covariance Determinant) and the classical method
(MLE) in detection of outliers at different levels and 2.4. Instrumentation/Factor Extraction Method
magnitude of outliers. The proposed robust method Maximum likelihood factoring allows the researcher to
competes favorably well with both MVE and MCD and test for statistical significance in terms of correlation among
performed better than any of the two methods in detection of factors and the factor loadings, but this method for
single or fewer outliers especially for small sample size and estimating factor models can yield distorted results when
when the magnitude of outliers is relatively small. observed data are not multivariate normal ([5]; [1]).
Principal axis factoring does not rely on distributional
assumptions and is more likely than maximum likelihood to
2. Methodology converge on a solution. However, principal axis factoring
2.1. Research Design does not provide the variety of fit indices associated with
maximum likelihood methods, and this method does not lend
The research method used in this study is Monte Carlo itself to the computation of confidence intervals and tests of
design to generate a data set. This incorporated samples significance.
simulated through different number of variables and sample Consider a data vector for subject i on p variables
sizes; specifically, a two (number of variables) and two represented as:
(sample size) by five (distributions) design.
Xi = ( Xi1 Xi2 ... Xip)/, I = 1 2, ,...,n
2.2. Population of the Study Standardization of the data matrix X is performed since
the p variables could be in different units of measurement.
This simulation was developed in RGUi (64 – bit).
The standardized matrix, Z, is written as Z = (V1/2)-1 (X - µ),
The program associated with this design was written. To
where V1/2is the diagonal standard deviation matrix and µ is
enhance this project’s generalizability, this study included
the vector of the means. Clearly, E(Z) = 0 and Cov(Z) =
simulations of results from data sets of varying size. It
(V 1/ 2 )-1 (V 1/ 2 ) = ρ, where ∑ is the variance-covariance
simulated data sets containing 8 and 20 variables. The two
matrix and is the population correlation matrix. The kth
sample sizes included in this study were 20 (small sample
principal component of Z =(Z1 Z2 ... Z p )/ is given by:
size) and 500 (large sample size). The initial factor solutions
from each model in each condition was subjected to an Yk =e/kZk = e/k (V 1/ 2 )-1 (X - µ), k = 1,2,..., p (3.1)
orthogonal rotation strategy. Specifically, this study where (λk, ek) is the kth eigenvalue-eigenvector pair of the
employed a varimax rotation in all simulated contexts. correlation matrix, with λ1 ≥ λ2 ≥...≥λp ≥0. Since Zi is a
𝑝 𝑝
Because the intent of this study is to address methodological standardized variable, 𝑖=1 𝑉𝑎𝑟 (𝑌𝑖 ) = 𝑖=1 𝑉𝑎𝑟 (𝑍𝑖 ) = 𝑝
issues that are frequently encountered in social science and the correlation coefficient between component Yk and
literature, varimax rotation was considered to be a more standardized variable Zl is 𝜌𝑌𝑘 , 𝑍𝑙 = ℯ𝑘𝑙 𝜆𝑘, , k, l = 1, 2, …,
appropriate choice ([2]; [11]). p. The proportion of standardized population variance due to
48 Onyekachi Akuoma Mabel and Olanrewaju Samuel Olayemi: A Comparison of Principal
Component Analysis, Maximum Likelihood and the Principal Axis in Factor Analysis

the kth principal component is λk/ p. 2.5.2. Extraction by Maximum Likelihood Factor Analysis
A correlation matrix can be thought of as a matrix of
In finding factors that can reproduce the observed
variances and covariances of a set of variables that have
correlations or covariances between the variables as closely
standard deviations of 1. It can be expressed as a function of
as possible, a maximum likelihood estimation (MLE)
its eigen values λk and eigenvectors, ek, as follows:
procedure will find factors that maximize the likelihood of
𝑝 ′
𝜌= 𝑘=1 𝜆𝑘 ℯ𝑘 ℯ𝑘 (3.2) producing the correlation matrix. In trying to do so, it
The correlation matrix is modeled as ρ = LL’ + Ψ, where assumes that the data are independently sampled from a
Ψ is a diagonal matrix of specific variances. As a principal multivariate normal distribution with mean vector µ, and
component analysis takes all variance into account, Ψ is variance-covariance matrix of the form ∑ = LL’ + Ψ, where
assumed to be zero and the variance-covariance matrix is L is the matrix of factor loadings and Ψ, is the diagonal
modeled as ρ = LL’. A PCA procedure will try to matrix of specific variances. The MLE procedure involves
approximate the correlation by a summation over m<p, i.e. the estimation of µ, the matrix of factor loadings L, and the
specific variance Ψ, from the log likelihood function which
𝑚 ′
𝜌 ≅ 𝑖=1 𝜆𝑖 ℯ𝑖 ℯ𝑖 = ( 𝜆1 𝑒1 𝜆2 𝑒2 … is given by the following expression:
𝜆 1 𝑒1 𝒏𝒑 𝒏
𝑙 µ, 𝐋, 𝚿 = 𝒍𝒐𝒈 𝟐𝝅 − 𝒍𝒐𝒈 𝑳𝑳′ + 𝚿
𝜆𝑚 𝑒𝑚 )
𝜆 2 𝑒2
… = 𝐿𝐿′ (3.3) 𝟐 𝟐
𝜆 𝑚 𝑒𝑚
− 𝑿 − µ 𝐋𝐋′ + 𝚿 𝑿𝒊 − µ .

𝟐 𝒊
where L is the matrix of factor loadings with a factor loading By maximizing the above log likelihood function, the
estimated as 𝐿𝑖𝑗 = 𝜆𝑖 𝑒𝑖𝑗 . maximum likelihood estimators for µ, L and Ψ are obtained.

2.5.3. Extraction by Principal Axis Factoring

2.5. Data Collection Procedure
The most widely-used method of extraction in factor
2.5.1. Extraction by Principal Component Analysis Method analysis is the principal axis factoring (PAF) method. The
The principal factor method involves finding an method seeks the least number of factors which can account
approximation 𝚿 of Ψ, the matrix of specific variances, and for the common variance of a set of variables. In practice,
PAF uses a PCA strategy but applies it on a slightly different
then correcting R, the correlation matrix of X, by 𝚿. The
version of the correlation matrix. As the analysis of data
principal component method is based on an approximation
structure in PAF is focused on common variance and not on
𝐐 of 𝐐, the factor loadings matrix. The sample covariance
sources of error that are specific to individual measurements,
matrix is diagonalized, 𝑺 = 𝚪𝚲𝚪 𝑻 . Then the first K
the correlation matrix ρ in PAF has estimates of
eigenvectors are retained to build
communalities as its diagonal entries, instead of 1’s as in
𝐐= 𝝀𝟏 𝜸𝟏 , … , 𝝀𝒌 𝜸𝒌 . PCA.
The estimated specific variances are provided by the Allowing for specific variance, the correlation matrix is
diagonal elements of the matrix 𝑺 − 𝐐𝐐𝑻 , estimated as 𝝆 = 𝑳𝑳′ + 𝜳, where Ψ is a diagonal matrix of
specific variances. The estimate of the specific variances is
𝜓11 0 … 0 obtained as
0 𝜓22 … 0 Ψ = 𝝆 − 𝑳𝑳′ , where matrix 𝑳𝑳′ is as defined in (3) and
𝚿= . 𝑤𝑖𝑡ℎ 𝜓11 = 𝑠𝑋𝑗 𝑋𝑗 − 𝑘 2
𝑙=1 𝑞𝑗𝑙 diagonal entries Ψ are estimated as 𝑦𝑖𝑗 = 𝑠𝑖2 − 𝑚
𝜆𝑖 𝑒𝑖𝑗2 .
. 𝑖=1
0 0 … 𝜓𝑝𝑝 2.6. Method of Data Analysis
By definition, the diagonal elements of 𝑺 are equal to the For this study, simulated data was used to examine the
diagonal elements of effects of (1) Principal Component Analysis, Maximum
𝐐𝐐𝑻 + 𝚿. The off-diagonal elements are not necessarily Likelihood analysis method and Principal Axis factoring
estimated. How good then is this approximation? Consider method, (2) Small and large variables, (3) Small and large
the residual matrix sample sizes using varimax on five statistical distributions:
𝑺 − (𝐐𝐐𝑻 + 𝚿). Uniform, Normal, Gamma, Exponential, and Laplace. Our
goal was to utilize methods that most closely simulate real
resulting from the principal component solution. practice, and real data, so that our results will shed light on
Analytically we have that the effects of current practice in research and also advise
(𝑺 − 𝐐𝐐𝑻 − 𝚿)2𝑖𝑗 ≤ 𝝀2𝑘+1 + ⋯ + 𝝀2𝑝 researchers on the best extraction method that can be adopted
in data analysis to accomplish their goal.
American Journal of Mathematics and Statistics 2020, 10(2): 44-54

2.7. Data Presentation, Analysis and Interpretation Likelihood/Principal Axis Loadings from the simulation of
each sample from the various distribution with respect to
2.7.1. Presentation of Data each extraction method is presented in the table below:
The result of the first Principal Component/Maximum

Table 1. First Principal Component/Maximum Likelihood/Principal Axis Loadings from the simulation

Distributions Uniform Normal Gamma Exponential Laplace

Extraction method
Sample distribution
V8n20 1.54 1.54 1.5 0.9 1.15 1 1.2 0.99 1.3 1.2 1.09 1.5 0.3 0.71 0.61
V20n20 -0.04 0.4 1.2 0.6 0.5
V8n500 1.77 0.87 0.8 0.7 0.997 0.6 2 0.99 1 0.8 0.19 0.3 2.1 1.098 1.23
V20n500 0.95 0.397 0.9 1.2 0.11 0.5 0.2 0.27 0.1 1.6 1.01 0.8 0.3 0.195 -0.5

First Factor/Component loadings of PC/ML/PAF


-1 Uniform Normal Gamma Exponential Laplace

Figure 1. First Principal Component/Maximum Likelihood/Principal Axis Loadings from the simulation

The result of the Sum of Square loadings and Proportion distribution with respect to each extraction method is
Variance of the 1st principal component/Maximum presented in the table below:
Likelihood/principle axis of each sample from the various

Table 2. Simulation Results for Sample = 20

Sample Distributions
Extraction sample 1st PC/ML/PA Uniform Normal Gamma Exponential Laplace
method (n) Performance Variables Variables Variables Variables Variables
v8 v20 v8 v20 v8 v20 v8 v20 v8 v20
SS loadings 2.05 3.51 1.74 2.9 1.85 2.91 2.14 3.44 2.09 3.13
(PCA) 20
Proportion Var 0.26 0.18 0.22 0.15 0.23 0.15 0.27 0.17 0.26 0.16
SS loadings 1.63 N/A 1.357 N/A 1.479 N/A 1.888 N/A 1.557 N/A
(MLFA) 20
Proportion Var 0.204 N/A 0.17 N/A 0.185 N/A 0.236 N/A 0.195 N/A
SS loadings 1.55 N/A 1.14 N/A 1.59 N/A 2.23 N/A 1.52 N/A
(PAFA) 20
Proportion Var 0.19 N/A 0.14 N/A 0.2 N/A 0.28 N/A 0.19 N/A
50 Onyekachi Akuoma Mabel and Olanrewaju Samuel Olayemi: A Comparison of Principal
Component Analysis, Maximum Likelihood and the Principal Axis in Factor Analysis

Simulation Results for ss loadings of the 1st PC/ML/PA

n = 20

SS loadings SS loadings SS loadings












v8 v20 v8 v20 v8 v20 v8 v20 v8 v20

uniform normal gamma exponential laplace

Figure 2. Simulation Results for SS Loadings of the 1st PC/ML/PA at 20 Samples

Simulation Results for proportion variance of the 1st PC/ML/PA

n = 20
Proportion Var Proportion Var Proportion Var













v8 v20 v8 v20 v8 v20 v8 v20 v8 v20

uniform normal gamma exponential laplace

Figure 3. Simulation Results for Proportion Variance of the 1st PC/ML/PA at 20 Samples

Table 3. Simulation Results for Sample = 500

Sample Distributions
Extraction sample 1st PC/ML/PA
Uniform Normal Gamma Exponential Laplace
method (n) Performance
v8 v20 v8 v20 v8 v20 v8 v20 v8 v20
SS loadings 1.23 1.39 1.13 1.4 1.19 1.33 1.2 1.36 1.22 1.4
(PCA) 500
Proportion Var 0.15 0.07 0.14 0.07 0.15 0.07 0.15 0.07 0.15 0.07
SS loadings 1.03 0.429 1.016 0.47 0.278 0.416 0.25 0.436 1.04 0.443
(MLFA) 500
Proportion Var 0.129 0.021 0.127 0.024 0.035 0.021 0.031 0.022 0.13 0.022
SS loadings 0.35 0.43 0.56 0.47 0.27 0.4 0.25 0.43 0.71 0.44
(PAFA) 500
Proportion Var 0.04 0.02 0.07 0.02 0.03 0.02 0.03 0.02 0.09 0.02
American Journal of Mathematics and Statistics 2020, 10(2): 44-54 51

Simulation Results for ss loadings of the 1st PC/ML/PA

n = 500
SS loadings SS loadings SS loadings












v8 v20 v8 v20 v8 v20 v8 v20 v8 v20

uniform normal gamma exponential laplace

Figure 4. Simulation Results for SS Loadings of the 1st PC/ML/PA at 500 Samples

Simulation Results for proportion variance of the 1st PC/ML/PA

n = 500
Proportion Var Proportion Var Proportion Var




















v8 v20 v8 v20 v8 v20 v8 v20 v8 v20

uniform normal gamma exponential laplace

Figure 5. Simulation Results for Proportion Variance of the 1st PC/ML/PA at 500 Samples

Pricipal Components Analysis (PCA) Pricipal Components Analysis (PCA)











v8 v20 v8 v20 v8 v20 v8 v20 v8 v20

uniform normal gamma exponential laplace

Figure 6. Principal Component Analysis, SS Loadings for 20 and 500 samples

52 Onyekachi Akuoma Mabel and Olanrewaju Samuel Olayemi: A Comparison of Principal
Component Analysis, Maximum Likelihood and the Principal Axis in Factor Analysis

20 Proportion Var
v8 v20 v8 v20 v8 v20 v8 v20 v8 v20
500 Proportion Var
Variables Variables Variables Variables Variables

uniform normal gamma exponencial laplace

Sample Distributions

Figure 7. Principal Component Analysis, Proportion Variances for 20 and 500 samples

2.7.2. Data Analysis and Result 0.5 across all the distributions except for PC method for
Laplace distribution.
Factor analysis uses variances to produce communalities
 Only PC can be used for analysis when the variables
between variables. The variance is equal to the square of the
have equal variables with the sample size.
factor loadings. In many methods of factor analysis, the goal
 For small (8) variables and large (500) sample sizes, PC
of extraction is to remove as much common variance in the
formed better across all distributions except for normal
first factor as possible.
distribution where ML is slightly higher but Uniform,
From table 1, we compare the different extraction methods
Gamma and Laplace distributions perform best when
per each of the distribution. It was observed that:
large sample size is applied with smaller variables using
 PC is the best extraction method for Uniform PCA.
distribution for all variables and all samples.  For large variables and large sample sizes, PC
 For Normal distribution, ML is the best extraction performed better across all the distributions.
method for normal when working with 8 variables
Using figure 2, when the sample size is small over small
(small) over different sample sizes but PC is better
and large variables, it can be observed that for the sum of
when the variables are 20 (large) over large sample
square loadings of the extraction methods across the
distributions, PC is seen to perform better than other
 For Gamma distribution, PC and PAF are better with 8
extractions methods for all variables and all distributions
(small) variables and 20 (small) sample sizes; PC is best
used in this study.
for 8 (small) variables and 500 (large) samples sizes
Figure 3 explained the proportion variance for small
while ML is better with 20 (large) variables over large
sample sizes over small variables, PC is the best method
sample sizes.
for all distributions. PAF explains the greatest in the
 For Exponential distribution, all the extraction methods
Exponential distribution over others. Also, when compared
studied are good but PAF is the best for small variables
other small sample sizes and small variables, PAF is seen to
and sample sizes while PC is the best for different
perform very well.
variables with large numbers.
In Figure 4, when the sample size is large, PC showed the
 For Laplace distribution, ML and PAF are good
highest the ssloadings for all variables over all distributions.
methods to use for small variables and small sample
In figure 5, for the large sample sizes, PC explained the
sizes though ML performed better. PC performed better
highest proportion variance across the distribution. It was
the other large variables and large sample sizes even at
also seen that the proportion variance of the PC seems equals
small variables and large sample sizes.
over all small variables and also the same across the large
Comparing the same number of variables and sample sizes variables.
over different distributions. Figure 6, for ss loadings across the sample sizes, PC
Also, from table 1, we can deduce that: performed better over small sample sizes than the large
 For small (8) variables and small (20) number of sample sizes. Same also it is for proportion variables as
samples, all the extraction methods can be used for observed in figure 7.
analysis since the sum of the factor loading are above
American Journal of Mathematics and Statistics 2020, 10(2): 44-54

3. Discussion on Findings 5. Conclusions

From the figure 1 above, we can observe that Uniform, From some of the literatures reviewed, many researchers
Gamma and Laplace distributions perform best when large believe that there is almost no evidence regarding which
sample size is applied with smaller variables using PCA. method should be preferred for different types of factor
Also, the proportion of variance explained by PCA is higher patterns and sample sizes. Some of these contributed to this
on a lower variable than large variable. research and the following conclusions can be drawn from
Similarly, when the three extractions methods were the results, obtained by the Monte Carlo simulations:
applied on a real-life data used under this study, the methods  PC is the best extraction method for Uniform
of PCA and PAF methods achieved a simple structure of the distribution for all variables and all samples.
loadings following rotation. The loadings from each method  For Normal distribution, ML is the best extraction
are rather similar and don’t differ significantly, though the method for normal when working with 8 variables
principal component method yielded factors that load more (small) over different sample sizes but PC is better
heavily on the variables which the factors hypothetically when the variables are 20 (large) over large sample
represent. However, the factors resulting from the principal sizes.
component method explain 54% of the cumulative variance  For Gamma distribution, PC and PAF are better with 8
compared to 46% from the principal factor method. (small) variables and 20 (small) sample sizes; PC is best
for 8 (small) variables and 500 (large) samples sizes
while ML is better with 20 (large) variables over large
4. Summary sample sizes.
The broad purpose of factor analysis is to summarize data  For Exponential distribution, all the extraction methods
so that relationships and patterns can be easily interpreted studied are good but PAF is the best for small variables
and understood. It is normally used to regroup variables into and sample sizes while PC is the best for different
a limited set of clusters based on shared variance. Hence, it variables with large numbers.
helps to isolate constructs and concepts.  For Laplace distribution, ML and PAF are good
The goal of this paper is to collect information that will methods to use for small variables and small sample
allow researchers and practitioners to understand the various sizes though ML performed better. PC performed better
choices of factor extractions available, and to make decisions the other large variables and large sample sizes even at
about "best practices" in exploratory factor analysis. A small variables and large sample sizes.
Monte Carlo study has been performed for assessing the Comparing the same number of variables and sample sizes
accuracy of three frequently used methods for extraction of over different distributions.
number of factors and components in factor analysis and Also, from table 1, we can deduce that:
principal component analysis: The Principal Components
 For small (8) variables and small (20) number of
Analysis, Maximum Likelihood Factor Analysis, and
samples, all the extraction methods can be used for
Principal Axis Factor Analysis method.
analysis since the sum of the factor loading are above
Data was generated independently in a sequence of 1000
0.5 across all the distributions except for PC method for
replications through simulation for factor analysis, so that
Laplace distribution.
it would have a specific number of underlying components
 Only PC can be used for analysis when the variables
or factors: Then the three methods for extracting the number
have equal variables with the sample size.
of factors/components were applied and their overall
 For small (8) variables and large (500) sample sizes, PC
performance examined. The SS loadings of the different
formed better across all distributions except for normal
distributions using different variables and sample sizes were
distribution where ML is slightly higher but Uniform,
computed as well as the proportion variance of the
Gamma and Laplace distributions perform best when
distribution on different extraction methods from the
large sample size is applied with smaller variables using
simulated data. The procedure was performed by simulating
datasets with 8 and 20 number of variables over sample sizes
 For large variables and large sample sizes, PC
of 20 and 500 for all combinations of 8 and 20 (number of
performed better across all the distributions.
In summary, our research seems to agree with [2] that
says that when each factor is represented by three to four 6. Recommendations
measured variables and the communalities exceed .70,
relatively small sample sizes will allow researchers to make This thesis only analyses three of the most commonly
accurate estimates about population parameters as observed used methods for extracting the number of factors and
with the real-life data set. To avoid distortions derived from components in FA and PCA. However, there are other
sample characteristics, researchers can select a sample that developed methods such as Image Factor Extraction,
maximizes variance on measured variables that are not Unweighted Least Squares Factoring, Generalized Least
relevant to the construct of interest ([2]). Squares Factoring and Alpha Factoring and used only in
54 Onyekachi Akuoma Mabel and Olanrewaju Samuel Olayemi: A Comparison of Principal
Component Analysis, Maximum Likelihood and the Principal Axis in Factor Analysis

principal component analysis. A review of these methods getting the most from your analysis. Practical Assessment,
would be a suitable continuation of the current thesis. Research and Evaluation, 10(7), 1-9.
The knowledge of studies, such as this one can help [6] Conway, J. M., & Huffcutt’s, A. I. (2003). A review and
researchers pinpoint the best extraction method of number of evaluation of exploratory factor analysis practices in
factors or components, even if all the methods have given organizational research. Organizational Research Methods,
different estimates. To do so, one has to know the strengths 6(2), 147-168.
and weaknesses of each method and how they compare to [7] Fuller E. L. & Hemmerle W. J. (1966). Robustness of the
each other for different combinations of variables and maximum-likelihood estimation procedure in factor
sample sizes. analysis. Psychometrika, Volume 31, pp 255–266.
[8] Nwabueze J. C. (2010). Principal component procedure in
factor analysis and robustness. Academic. Journals 2010
TEXT text/html https://academicjournals.org/journal/AJMC
[9] Nwabueze, J. C., Onyeagu, S. I. & Ikpegbu O. (2009).
[1] Henson, R. K., & Roberts, J. K. (2006). Use of Exploratory Robustness of the maximum likelihood estimation procedure
Factor Analysis in Published Research: Common Errors and in factor analysis. African Journal of Mathematics and
Some Comment on İmproved Practice. Educational and Computer Science Research Vol. 2(5), pp. 081-087.
Psychological Measurement, 66, 393–416.
[10] Oyeyemi, G.M. and Ipinyomi, R.A. (2010). Some Robust
[2] Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Methods of Estimation in Factor Analysis in the Presence of
Strahan, E. J. (1999). Evaluating the Use of exploratory factor Outliers. ICASTOR Journal of Mathematical Sciences. Vol. 4.
analysis in psychological research. Psychological Methods, No.1 Pp. 1-12.
4(3), 272-299.
[11] Raykov T. & Little T. D. (1999). A Note on Procrustean
[3] Harman, H.H. (1976). Modern factor analysis (3rd ed. Rotation in Exploratory Factor Analysis: A Computer
revised). Chicago, IL: University of Chicago Press. Intensive Approach to Goodness-of-Fit Evaluation.
[4] Stevens, J. P. (2002). Applied multivariate statistics for the
social sciences (4th ed.). Hillsdale, NS: Erlbaum. [12] Comfrey, L.A. & Lee, H.B. (1992). A first course in factor
analysis (2nd ed.). Hillside, NJ: Lawrence Erlbaum
[5] Costello, A.B., & Osborne, J.W. (2005). Best practices in Associates.
exploratory factor analysis: Four recommendations for

