Akaike 1987
Akaike 1987
Akaike 1987
3, 317-332
SEPTEMBER 1987
SPECIALSECTION
HIROTUGU AKAIKE
THE INSTITU~ OF STATISTICAL MATHEMA'F/CS
The information criterion AIC was introduced to extend the method of maximum likelihood
to the multimodel situation. It was obtained by relating the successful experience of the order
determination of an autoregressive model to the determination of the number of factors in the
maximum likelihood factor analysis. The use of the AIC criterion in the factor analysis is particu-
larly interesting when it is viewed as the choice of a Bayesian model. This observation shows that
the area of application of AIC can be much wider than the conventional i.i.d, type models on
which the original derivation of the criterion was based. The observation of the Bayesian structure
of the factor analysis model leads us to the handling of the problem of improper solution by
introducing a natural prior distribution of factor loadings.
Key words: factor analysis, maximum likelihood, information criterion AIC, improper solution,
Bayesian modeling.
1. Introduction
E ---- A A ' + W.
The author would like to express his thanks to Jim Ramsay, Yoshio Takane, Donald Ramirez and
Hamparsum Bozdogan for helpful comments on the original version of the paper. Thanks are also due to
Emiko Arahata for her help in computing.
Requests for reprints should be sent to Hirotugu Akaike, The Institute of Statistical Mathematics, 4-6-7
Minami-Azabu, Minato-Ku, Tokyo 106, Japan.
In the case of the maximum likelihood factor analysis this is done by adopting the
likelihood ratio test. However, in this test procedure, the unstructured saturated model is
always used as the reference and the significance is judged by referring to a chi-square
distribution with a large number of degrees of freedom equal to the difference between the
number of parameters of the saturated model and that of the model being tested. As will
be seen in section 3, an example discussed by J6reskog (1978) shows that direct appli-
cation of such a test to the selection of a factor analysis model is not quite appropriate.
There the expert's view clearly contradicts the conventional use of the likelihood ratio
test.
In 1969 the present author introduced final prediction error (FPE) criterion for the
choice of the order of an autoregressive model of a time series (Akaike, 1969, 1970). The
criterion was defined by an estimate of the expected mean square one-step ahead predic-
tion error by the model with parameters estimated by the method of least squares. The
successful experience of application of the FPE criterion to real data suggested the possi-
bility of developing a similar criterion for the choice of the number of factors in the factor
analysis. The choice of the order of an autoregression controlled the number of unknown
parameters in the model, that controlled the expected mean square one-step ahead predic-
tion error. By analogy it was easily observed that the control of the number of factors was
required for the control of the expected prediction error by the fitted model. However, it
was not easy to identify what the prediction error meant in the case of the factor analysis.
In the case of the autoregressive model an estimate of the expected predictive per-
formance was adopted as the criterion; in the case of the maximum likelihood factor
analysis it was the fitted distribution that was evaluated by the likelihood. The realization
of this fact quickly led to the understanding that our prediction was represented by the
fitted model in the case of the factor analysis, which then led to the understanding that
the expectation of the log likelihood with respect to the "true" distribution was related to
the Kullback-Leibler information that defined the amount of deviation of the "true"
distribution from the assumed model.
The analogy with the FPE criterion then led to the introduction of the criterion
AIC -- ( - 2) log maximum likelihood + 2 (number of parameters),
as the measure of the badness of fit of a model defined with parameters estimated by the
method of maximum likelihood, where log denotes a natural logarithm (Akaike, 1973,
1974). We will present a simple explanation of AIC in the next section and illustrate its
use by applying it to an example in section 3.
Although AIC produces a satisfactory solution to the problem of the choice of the
number of factors, the application of AIC is hampered by the frequent appearance of
improper solutions. This shows that successive increase of the number of factors quickly
lead to models that are not quite appropriate for the direct application of the method of
maximum likelihood.
In section 4 it will be discussed that the factor analysis model may be viewed as a
Bayesian model and the choice of a factor analysis model by minimizing the AIC criterion
is essentially concerned with the choice of a Bayesian model. This recognition encourages
the use of further Bayesian modeling for the elimination of improper solutions. In section
5 a natural prior distribution for the factor loadings is introduced through the analysis of
the likelihood function. Numerical examples will be given in Section 6 to show that the
introduction of the prior distribution suppresses the appearance of improper solutions
and that the indefinite increase of a communality caused by the conventional maximum
likelihood procedure may be viewed as of little practical significance.
HIROTUGU AKAIKE 319
The paper concludes with brief remarks on the contribution of factor analysis to the
development of general statistical methodology.
0(Yl 0o). Thus, to measure the deviation of O(x) from 0o in terms of the basic criterion of
twice the expected log likelihood, X~ must be subtracted twice from 2 log O(xlO(x)) to
make the difference of twice the log likelihoods an unbiased estimate of that of twice the
expected log likelihoods.
Since Z~ is unobservable, as we do not know 0o, we consider the use of its expected
value m. The negative of the quantity thus obtained defines
AIC = ( - 2) log O(x 1O(x)) + 2m.
When several different O's are compared the one that gives the minimum of AIC repre-
sents the best fit. Such an estimate is denoted as MAICE (minimum AIC estimate). For
more detailed discussion of the predictive point of view of statistics and the use of the
information criterion readers are referred to Akaike (1985),
where A~ denotes the matrix of factor loadings and W the uniqueness variance matrix.
The diagonal elements of A~A~, define the communalities. The AIC statistic for the k-
factor model is then defined by
AIC(k) = ( - 2 ) log L(k) + [2p(k + 1) - k(k - 1)].
To show the use of AIC in the maximum likelihood factor analysis and to illustrate
the difference between the AIC and conventional test approach in particular here we will
discuss an example treated by J6reskog (1978, p.457). This examle is concerned with the
analysis of Harman's example of twenty-four psychological variables. The unrestricted
four factor model was first fitted which produced
~21s6 = 246.36.
This model was considered to be "representing a reasonably good fit" but a further
restriction of parameters produced a simple structure model with
X231 = 301.42.
Thismodel was accepted as the best fitting simple structure.
Now we have
Prob {Xx2a6>- 246.361Ho} ~ 0.0009,
and
Prob{z23t ->_301.421 n~)} ~ 0.0005,
where H o and H~ denote the hypotheses of the four factor and the simple structure,
respectively, and the chi-squared variables stand for the random variables with respective
degrees of freedom. By the standard of conventional tests these figures show that the
results are extremely significant and both H 0 and H~) should be rejected. In spite of this,
HIROTUGU AKAIKE 321
the expert judgment of J6reskog was to accept the four factor model as a reasonable fit
and prefer the simple structure model to the unrestricted. This conclusion suggests that
the large values of the degrees of freedom appearing in the chi-squared statistics preclude
the application of conventional levels of significance, such as 0.05 or 0.01, in making the
final judgment of models in this situation.
The chi-squared statistic is defined by
•2 = ( _ 2) max log L(H) - ( - 2 ) max log L ( H J ,
where max log L(H) denotes the maximum log likelihood under the hypothesis H and H ~
denotes the saturated or completely unconstrained model. Since AIC for an hypothesis H
is defined by
AIC(H) = ( - 2 ) max log L(H) + 2 dim 0,
where dim 0 denotes the dimension of the vector of unknown parameters 0, we have
and
AIC(H~) -- 301.42 - 2 x 231
= - 160.58.
Since AIC(H~) = 0, these AIC's show that both H o and H~ are by far better than H ~ and
that the simple structure model H~ is showing a better fit than the unrestricted four factor
model H o .
This result by AIC is in complete agreement with J6reskog's conclusion. The conven-
tional theory of statistics does not tell how to evaluate the significance of a test in each
particular application and there is no hope of arriving at a similar conclusion. Obviously
the objective procedure of model selection by an information criterion can be fully imple-
mented to define an automatic factor analysis procedure. Such a possibility is discussed
by Bozdogan and Ramirez (1987).
parametrization and the minimum AIC procedure for model selection is considered to be
a realization of the well-known empirical principle of parsimony in statistical modeling.
However the application of the minimum AIC procedure assumes the existence of proper
m a x i m u m likelihood estimates of the models considered. The frequent occurrence of
improper solutions in the m a x i m u m likelihood factor analysis means that the models are
often too much overparametrized for the application of the method of m a x i m u m likeli-
hood. This suggests the necessity of further control of the likelihood function. This can be
realized by the use of some proper Bayesian modeling.
Before going into the discussion of this Bayesian modeling we will first notice the
essentially Bayesian characteristic of the factor analysis model and point out that the
minimum AIC procedure is concerned with the problem of the selection of a Bayesian
model. In the basic factor analysis model y = Ax + u the vector of observations y is
assumed to be distributed following a Gaussian distribution with mean Ax and unique
variance W. The vector of factor scores x is unobservable but is assumed to be distributed
following a Gaussian distribution with zero mean and variance lk× k. Since x is never
observed this distribution is simply a psychological construction for the explanation of
the behavior of y. Under the assumption that A is fixed the distribution of x specifies the
prior distribution of the mean of the observation y. Thus we can see that the choice of k,
the number of factors, is essentially concerned with the choice of a Bayesian model.
Incidentally, the recognition of the Bayesian characteristic of the factor analysis model
also suggests the use of the posterior distribution of x for the estimation of the factor
scores as is discussed by Bartholomew (1981).
The basic problem in the use of a Bayesian model is how to justify the use of a
subjectively constructed model. Our belief is that it is possible only by considering various
possibilities as alternative models and comparing them with an objectively defined cri-
terion. In particular we propose the use of the log likelihood, or the AIC when some
parameters are estimated by the method of m a x i m u m likelihood, as the criterion of fit.
Let us consider the likelihood of a factor analysis model as a Bayesian model. For a
Bayesian model specified by the data distribution p(. 10) and prior distribution p(O) its
likelihood with respect to the observed data y is given by
= 1- m 2 e x p -- -trZ-1S ,
This is exactly the likelihood function used in the conventional maximum likelihood
factor analysis. Thus the maximum likelihood estimates of A and W in the classical sense
are the maximum likelihood estimates of the unknown parameters of a Bayesian model.
The above result shows that the AIC criterion defined for the factor analysis model is
actually the ABIC criterion for the evaluation of a Bayesian model with parameters
estimated by the method of maximum likelihood, where ABIC is defined by (Akaike,
1980)
ABIC = (-- 2) maximum log likelihood of a Bayesian model
+ 2 (number of estimated parameters).
In the case of the factor analysis model we have
ABIC = AIC.
This identity clearly shows that there is no essential distinction between the classical and
Bayesian models when they are viewed from the point of view of the information cri-
terion.
(
where the log likelihood log L is defined in the preceding section. By ignoring the additive
constant we have
q = - l o g ]Z-ISI + tr Z-1S.
= 0 ( I + CC')O,
and
I X - I S I = I(I + C C ' ) - ~ I I D - X S D - t l .
The modified negative log likelihood q can conveniently be expressed by using the
eigenvectors z~ and eigenvalues (i of D - X S D -1, the standardized sample covariance
matrix. Define the matrix Z by
It is assumed that Z is normalized so that Z ' Z ' = I holds. Represent C by Z in the form
C = ZF.
A d o p t the representation
p
FF' = ~ #i mim;,
t=1
and
tr (I + C C ' ) - I D - I S D -1 = tr ~ 2 7 t l , l'i ~ ~jzjz' i
i i
-~- E E )~i-lCjm2(j),
i j
where mi(j) denotes the j-th element of m i . The last relation is obtained from the equation
z) Ii = mi(j).
We also have
P P
I(I + CC')-X l I D - ' S D - ' I = I-I ;Li-1 ]-I ~j.
i=1 j='
Thus we get the following representation of the modified negative likelihood function
as a function of 2 = (41, 2 2 , . . . , 2p)and m = (ml, m 2 , . . . , rap):
p p p
Assume that (i and 2~ are arranged in the descending order, that is, (1 > (2 > "'" -~ (p and
21 > 22 > ' " > 2p, where 2k+1 . . . . . 2p = 1. Then the successive minimization of q(2,
m) with respect to mr, mr_ 1, " " , ml leads to
k p
q(2) = Z E2,-1~, - log (2;1~,)] + Z (~, - log ~,).
i=1 l=k+l
where (i > 1, for i < k*, < 1, otherwise. This last quantity is equal to the quantity given
by the Equation (18) of J6reskog (1967, p.448) and is ( - 2 / N ) times the maximum log
likelihood of the factor analysis model when D is given.
In maximizing the likelihood we would normally hope that a too small value of some
of the diagonal elements of D will reduce the maximum likelihood of the corresponding
model. However, that this is not the case is shown by the above result which explains that
the value of the maximum likelihood is sensitive only to the behavior of smaller eigen-
values of D- 1SD- 1. A very small diagonal element of D will only produce a very large
eigenvalue. Thus the process of maximizing the likelihood with respect to the elements of
D does not eliminate the possibility of some of these elements going down to zero.
The form of q(2) shows that if we introduce an additive term pEg~ with p > 0 then
the minimization of
P p
q(2) = ~ [ 2 F 1 ~ , - log (;tF 1~,)] + p ~ #,,
l 1 i=1
with respect to 2 does not allow any of 2~ ( = 1 + #~) going to infinity. Taking into account
the relations C = Z F and FF' = Y~#im~m'~we get
P
#~ = tr FF' = tr CC'.
i=1
TABLE 1
Communality estimates*
Harman : eight physical
p = 0 (MLE)
~/. 1 2 3 4 5 6 7 8
1 842 865 810 813 240 171 123 199
2 830 893 834 801 911 636 584 463
3 872 1000 806 844 909 641 589 509
p =0.1
LA/, 1 2 3 4 5 6 7 8
1 837 858 804 810 241 172 124 200
2 828 881 828 800 855 647 591 476
3 858 910 830 832 859 650 590 523
4 865 910 832 843 851 689 649 521
5 same as above
p = 1.0
Lki, 1 2 3 4 5 6 7 8
1 763 768 725 739 252 181 134 204
2 766 781 742 743 590 486 440 409
3
4 same as above
5
6. Numerical Examples
The Bayesian model defined with the standard spherical prior distribution of the
factor loadings was applied to six published examples of improper solutions. These exam-
ples are Harman's eight physical variables data (Harman, 1960, p.82), with p = 8 and
improper at k = 3, Davis data (Rao, 1955, p.ll0), with p = 9 and improper at k = 2,
Maxwelrs normal children data (Maxwell, 1961, p.55), with p = 10 and improper at k = 4,
Emmett data (Lawley & Maxwell, 1971, p.43), with p = 9 and improper at k = 5, Max-
well's neurotic children data (p.53), with p = 10 and improper at k = 5, and Harman's
twenty-four psychological variables data (Harman, 1960, p.137), with p = 24 and im-
proper at k = 6.
The informational point of view suggests that hyperparameter 19 of the prior distri-
HIROTUGU AKAIKE 327
TABLE 2
Communality estimates
Davis data
p = 0 (MLE)
~ 1 2 3 4 5 6 7 8 9
1 658 661 228 168 454 800 705 434 703
2 652 1000 243 168 464 816 704 435 70t
3 1000 661 220 204 451 1000 701 488 696
p =0.1
k\~ 1 2 3 4 5 6 7 8 9
1 653 656 226 167 451 790 700 431 697
2 694 689 227 171 470 800 698 434 696
3 701 695 251 197 470 801 698 444 696
4 same as above
p = 1.0
L~ 1 2 3 4 5 6 7 8 9
1 596 598 210 156 415 702 633 400 631
2
3 same as above
4
where B(i, j) denotes (i, j)th element of B. The estimates of communalities are defined by
diag (D1B1B'~D1). The process is repeated until convergence is established.
When p = 0 the above procedure produced m a x i m u m likelihood solutions that were
confirmed by a procedure based on the result of Jennrich and Robinson (1969). When
p > 0 the solution may only be considered as an arbitrary approximation to the posterior
328 PSYCHOMETRIKA
TABLE 3
Three factor maximum likelihood solution of Emmett data
mode. Nevertheless it will be sufficient for the purpose of confirming of the effect of the
tempering of the likelihood function. For convenience we will call the solution the Bayes-
ian estimate.
In the case of the above six examples the choice of p = 1.0 produced solutions with
signficant overall reduction of communalities, or increase of specific variances. With the
choice of p = 0.1 solutions were usually close to the conventional m a x i m u m likelihood
estimates but with the improper estimates of communalities suppressed. I m p r o p e r esti-
mates disappeared completely, unless p was made extremely small. F o r a fixed p estimates
of communalities usually stabilized as k, the number of factors, was increased.
It was generally observed that when the m a x i m u m likelihood method produced an
improper solution first at k =/c o the corresponding Bayesian estimate with p = 0.1 was
proper but with only one communality estimate inflated compared with the estimate at
k = k o - 1. Such a singular increase of the communality means the reinterpretation of a
part of the specific variation as an independent factor. This fact and the result of our
analysis of the likelihood function suggest that the singular increase of the communality is
usually caused by the overparametrization that makes the estimate sensitive to the sam-
piing variability of the data rather than by the structural change of the best fitting model
at k = k o . This is in agreement with the earlier observation of T s u m u r a and Sato (1981)
on the nature of improper solutions.
Tables 1 and 2 provide estimates of communalities of H a r m a n ' s eight physical varia-
bles data and of Davis' data, respectively, for various choices of the order, k, and p. In the
HIROTUGU AKAIKE 329
TABLE 4
Communality estimates by various procedures
Emmett data
p = 0 (MLE)
~ 1 2 3 4 5 6 7 8 9
1 510 537 300 548 390 481 525 224 665
2 538 536 332 809 592 778 597 256 782
3 550 573 384 788 619 823 600 538 769
4 554 666 379 772 663 856 648 480 759
5 556 868 1000 780 664 836 666 464 743
p =0.1
/~/, 1 2 3 4 5 6 7 8 9
1 502 529 296 545 388 478 516 221 652
2 535 531 330 790 588 762 590 252 753
3 549 561 378 783 611 786 590 399 750
4
5 same as above
p=l.0
L~/, 1 2 3 4 5 6 7 8 9
1 425 448 254 478 344 422 434 189 540
2 433 450 261 522 391 472 445 196 551
3
4 same as above
5
case of the Harman data the result in Table 1 shows that the improper value 1000 at i = 2
with k = 3, obtained with p = 0, disappeared for the positive values of p. In particular,
with p = 0.1, the solutions with k = 2, 3 and 4 are all mutually very close and they are
close to the solutions with p = 0 and k = 2 and 3, except for the improper component at
k = 3. This suggests that the two-factor model is an appropriate choice, which is in
agreement with Harman's original observation. The soltuion with p = 1.0 conforms with
this observation.
For the Davis data with ko = 2 the non-uniqueness of the convergence of iterative
procedures for the maximum likelihood was first reported by Tsumura, Fukutomi, and
Asoo (1968). With k = 2, JSreskog (1967, p.474) reported improper estimate of specific
variance for the 1st component and Tsumura et al. (p.57) found one for the 8th compo-
nent. As is shown in Table 2 our procedure found one at the 2nd component. The result
330 PSYCHOMETRIKA
TABLE 5
MAICE = oo**
Davis
p=9 ko = 2 ks = 1
MAICE = c o * *
Maxwell: normal
p=lO ko = 4 ks = 3
MAICE = oo**
Emmett
p=9 ko = 5 ks = 2
MAICE = 3
Maxwell: neurotic
p =10 ko = 5 ks = 2
MAICE = 3
Harman • 24 variables
p = 24 ko = 6 ks = 5
MAICE = 5
* p • dimension of observation
k o • lowest order with improper solution
ks • suggested order by the Bayesian analysis
** oo denotes saturated model.
given in Table 4 of Martin and M c D o n a l d (1975, p.515) also suggests the existence of
improper solution with zero unique variance for the 2nd component. These results suggest
the existence of local maxima of the likelihood function. Table 2 also gives improper
estimates for the 1st and 6th components with k = 3, which is in agreement with the result
reported by JSreskog.
The estimates obtained with p = 0.1 may be viewed as practically identical and are
close to the solution with p = 0, the m a x i m u m likelihood estimate, for k = 1. This result
HIROTUGU AKAIKE 331
strongly suggests that the improper solutions are spurious in the sense that they can be
suppressed by mild tempering of the likelihood function. The one-factor model seems a
reasonable choice in this case. The solution with p = 1.0 conforms with the present
observation.
The phenomenon of the singular increase of a communality estimate is observed even
with k < k 0 . Such an example is given by the three-factor maximum likelihood solution of
the Emmett data. The maximum likelihood solution by Lawley and Maxwell (1971, p.43)
is reproduced in Tabl~ 3 which suggests the singular increase of the communality of the
8th component at k = 3. In Table 4 the estimate with p = 0.1 shows substantial increase
of communality at only the 8th component at k = 3, compared with the estimate at k = 2.
The increase is completely suppressed with p = 1.0. This result suggests that the high
value of the communality estimate of the 8th component at k = 3 obtained with p = 0 is
spurious. Similar phenomenon was observed with Maxwell's data of neurotic children for
the 2nd component at k = 3.
Tsumura and Sato (t981, p.163) report that, by their experience, improper solutions
were always with "quasi-specific factors" that respectively showed singular contributions
to some specific variances. The above example shows that our present Bayesian approach
can detect the appearance of such a factor even before one gets a definitely improper
solution. Thus we can expect that the present approach will realize a reasonable control
of improper solutions.
Table 5 summarizes the suggested choices of the number of factors for the six
examples where the choices by the minimum AIC procedure, MAICE, are also included.
The suggested choices are based on subjective judgments of the numerical results. It is
quite desirable to develop a numerical procedure for the evaluation of the likelihood of
each Bayesian model to arrive at an objective judgment.
It is interesting to note here that by a proper choice of p the Bayesian approach can
produce estimate of A even with k -- p. This explains the drastic change of the emphasis
between the modelings by the conventional and Bayesian approach. By the Bayesian
approach there is no particular meaning in trying to reduce the number of factors. To
avoid unnecessary distortion of the model it is even advisable to adopt a large value of k
and control the estimation procedure by a proper choice of p.
7. Concluding Remarks
It is remarkable that the idea of factor analysis has been producing so much stimulus
to the development of statistical modeling. In terms of the structure of the model it is
essentially Bayesian. Nevertheless, the practical use of the model was realized by the
application of the method of maximum likelihood and this eventually led to the introduc-
tion of AIC.
The concept of the information measure underlying the introduction of AIC leads
our attention from parameters to the distribution. This then provides a conceptual frame-
work for the handling of the Bayesian modeling as a natural extension of the convention-
al statistical modeling. The occurence of improper solutions in the maximum likelihood
factor analysis is a typical example that explains the limitation of the conventional mod-
eling. The introduction of the standard spherical prior distribution of factor loadings
provided an example of overcoming the limitation by a proper Bayesian modeling.
This series of experiences clearly explains the close dependence between the factor
analysis and AIC, or the informational point of view of statistics, and illustrates their
contribution to the development of general statistical methodology. It is hoped that this
332 PSYCHOMETRIKA
close contact between psychometrics and statistics will be maintained in the future and
contribute to the advancement of both fields.
References
Akaike, H. (1969). Fitting autoregressive models for prediction, Annals of the Institute of Statistical Mathematics,
21, 243-247.
Akaike, H. (1970). Statistical predictor identification. Annals of the Institute of Statistical Mathematics, 22,
203-217.
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov &
F. Csaki (Eds.), 2nd International Symposium on Information Theory (pp. 267-281). Budapest: Akademiai
Kiado.
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control,
AC-19, 716--723.
Akaike, H. (1980). Likelihood and the Bayes procedure. In J. M. Bernardo, M. H. De Groot, D. V. Lindley, &
A. F. M. Smith (Eds.), Bayesian Statistics (pp. 143-166). Valencia: University Press.
Akaike, H. (1985). Prediction and entropy. In A. C. Atkinson & S. E. Fienberg (Eds.), A Celebration of Statistics
(pp. 1-24). New York: Springer-Verlag.
Bartholomew, D. J. (1981). Posterior analysis of the factor model. British Journal of Mathematical and Statistical
Psychology, 34, 93-99.
Bozdogan, H., & Ramirez, D. E. (1987). An expert model selection approach to determine the "best" pattern
structure in factor analysis models. Unpublished manuscript.
Harman, H. H. (1960). Modern Factor Analysis. Chicago: University Press.
Jennrich, R. I., & Robinson, S. M. (1969). A Newton-Raphson algorithm for maximum likelihood factor
analysis. Psychometrika, 34, 111-123.
J6reskog, K. G. (1967). Some contributions to maximum likelihood factor analysis. Psychometrika, 32, 443--482.
Jrreskog, K. G. (1978). Structural analysis of covariance and correlation matrices. Psychometrika, 43, 443-477.
Lawley, D. N., & Maxwell, A. E. (1971). Factor Analysis as a Statistical Method, 2nd Edition. London: Butter-
worths.
Martin, J. K., & McDonald, R. P. (1975). Bayesian estimation in unrestricted factor analysis: a treatment for
Heywood cases. Psychometrika, 40, 505-517.
Maxwell, A. E. (1961). Recent trends in factor analysis. Journal of the Royal Statistical Society, Series A, 124,
49-59.
Rao, C. R. (1955). Estimation and tests of significance in factor analysis. Psychometrika, 20, 93-111.
Tsumura, Y., Fukutomi, K., & Asoo, Y. (1968). On the unique convergence of iterative procedures in factor
analysis. TRU Mathematics, 4, 52-59. (Science University of Tokyo).
Tsumura, Y., & Sato, M. (1981). On the convergence of iterative procedures in factor analysis. TRU Mathemat-
ics, 17, 159-168. (Science University of Tokyo).