Principal Component Analysis: Analytical Methods May 2014
Principal Component Analysis: Analytical Methods May 2014
Principal Component Analysis: Analytical Methods May 2014
net/publication/261551712
CITATIONS READS
686 2,126
2 authors, including:
Rasmus Bro
University of Copenhagen
300 PUBLICATIONS 20,568 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Rasmus Bro on 21 October 2019.
Principal component analysis is one of the most important and powerful methods in chemometrics as well
as in a wealth of other areas. This paper provides a description of how to understand, use, and interpret
Received 28th October 2013
Accepted 17th February 2014 principal component analysis. The paper focuses on the use of principal component analysis in typical
chemometric areas but the results are generally applicable.
DOI: 10.1039/c3ay41907j
www.rsc.org/methods
2812 | Anal. Methods, 2014, 6, 2812–2831 This journal is © The Royal Society of Chemistry 2014
Tutorial Review Analytical Methods
Table 1 Chemical parameters determined on the wine samples (data be lost as it would always be possible to go from e.g. the average
from http://www.models.life.ku.dk/Wine_GCMS_FTIR [February, to the two original variables.
2014]1,2)
This concept of using suitable linear combinations of the
Ethanol (vol%) original variables will turn out to be essential in PCA and is
Total acid (g L1) explained in a bit more detail and a slightly unusual way here.
Volatile acid (g L1) The new variable, say, the average of the two original ones, can
Malic acid (g L1) be dened as a weighted average of all 14 variables; only the
pH
other variables will have weight zero. These 14 weights are
Lactic acid (g L1)
Rest sugar (Glu + Fru) (g L1) shown in Fig. 4. Rather than having the weights of ethanol and
Citric acid (mg L1) glycerol to be 0.5 as they would in an ordinary average, they are
CO2 (g L1) chosen as 0.7 to make the whole 14-vector of weights scaled to
Density (g mL1) be a unit vector. When the original variables ethanol and glyc-
Total polyphenol index
erol are taken to be of length one (unit length) then it is
Glycerol (g L1)
Methanol (vol%) convenient to also have the linear combination of those to be of
Tartaric acid (g L1) length one. This then denes the unit on the combined vari-
pffiffiffi
able. To achieve this it is necessary to take 0.7 ( 2=2 to be exact)
of ethanol and 0.7 of glycerol, as simple Pythagorean geometry
shows in Fig. 5. This also carries over to more than two
samples from Australia and South Africa whereas there are less variables.
pronounced regional differences in the ethanol content. Using a unit weight vector has certain advantages. The most
Even though Fig. 2 may suggest that there is little relevant important one is that the unit vector preserves the size of the
regional information in ethanol, it is dangerous to rely too variation. Imagine there are ten variables rather than two that
much on univariate analysis. In univariate analysis, any co- are being averaged. Assume, for simplicity that all ten have the
variation with other variables is explicitly neglected and this value ve.
may lead to important features being ignored. For example, Regardless of whether the average is calculated from two or
plotting ethanol versus glycerol (see Fig. 3) shows an interesting ten variables, the average remains ve. Using the unit vector,
correlation between the two. This is difficult to deduce from though, will provide a measure of the number of variables
plots of the individual variables. If glycerol and ethanol were showing variation. In fact, the variance of the original variables
completely correlated, it would, in fact, be possible to simply and this newly calculated one will be the same, if the original
use e.g. the average or the sum of the two as one new variable variables are all correlated. Thus, using the unit vector preserves
that could replace the two original ones. No information would the variation in the data and this is an attractive property. One
This journal is © The Royal Society of Chemistry 2014 Anal. Methods, 2014, 6, 2812–2831 | 2813
Analytical Methods Tutorial Review
Fig. 4 Defining the weights for a variable that includes only ethanol and glycerol information.
2814 | Anal. Methods, 2014, 6, 2812–2831 This journal is © The Royal Society of Chemistry 2014
Tutorial Review Analytical Methods
Fig. 6 Using defined weights to calculate a new variable that is a scaled average of ethanol and glycerol (arbitrary numbers used here). The
average is calculated as the inner product of the 14 measurements of a sample and the weight vector. Some didactical rounding has been used in
the example.
This journal is © The Royal Society of Chemistry 2014 Anal. Methods, 2014, 6, 2812–2831 | 2815
Analytical Methods Tutorial Review
Explained variation
The variance of t can now be calculated but a more meaningful
assessment of the summarizing capability of t is obtained by
calculating how representative t is in terms of replacing X. This
Fig. 9 The first principal component of the wine data. The lower plot can be done by projecting the columns of X on t and calculating
shows the weights and the upper plot shows the weighted averages the residuals of that projection. This is performed by regressing
obtained with those weights. all variables of X on t using the ordinary regression equation
X ¼ tpT + E (3)
sometimes called a latent variable where, in contrast, the orig-
where p is the vector of regression coefficients and E is the
inal variables are manifest.
matrix of residuals. Interestingly, p equals w and the whole
The data are collected in a matrix X with I rows (i ¼ 1, ., I;
machinery of regression can be used to judge the quality of the
usually samples/objects) and J columns (j ¼ 1, ., J; usually
summarizer t. Traditionally, this is done by calculating
variables), hence of size I J. The individual variables
(columns) of X are denoted by xj (j ¼ 1, ., J) and are all vectors kXk2 kEk2
100% (4)
in the I-dimensional space. A linear combination of those x kXk2
variables can be written as t ¼ w1 x1 + . + wJ xJ, where t is
which is referred to as the percentage of explained variation of t.
now a new vector in the same space as the x variables (because it
In Fig. 10, it is illustrated how the explained variation is
is a linear combination of these). In matrix notation, this
calculated as also explained around eqn (4).
becomes t ¼ Xw, with w being the vector with elements wj ( j ¼ 1,
Note, that the measures above are called variations rather
., J). Since the matrix X contains variation relevant to the
than variances. In order to talk about variances, it is necessary
problem, it seems reasonable to have as much as possible of
to correct for the degrees of freedom consumed by the model
that variation also in t. If this amount of variation in t is
and this is not a simple task. Due to the non-linear nature of the
appreciable, then it can serve as a good summary of the x
PCA model, degrees of freedom are not as simple to dene as for
variables. Hence, the fourteen variables of X can then be
linear models such as in linear regression or analysis of vari-
replaced by only one variable t retaining most of the relevant
ance. Hence, throughout this paper, the magnitude of variation
information.
will simply be expressed in terms of sums of squares. For more
The variation in t can be measured by its variance, var(t),
information on this, refer to the literature.3,4
dened in the usual way in statistics. Then the problem trans-
lates to maximizing this variance choosing optimal weights w1,
., wJ. There is one caveat, however, since multiplying an PCA as a model
optimal w with an arbitrary large number will make the variance Eqn (3) highlights an important interpretation of PCA: it can be
of t also arbitrary large. Hence, to have a proper problem, the seen as a modelling activity. By rewriting eqn (3) as
weights have to be normalized. This is done by requiring that
their norm, i.e. the sum-of-squared values, is one (see Fig. 5). ^ + E,
X ¼ tpT + E ¼ X (5)
2816 | Anal. Methods, 2014, 6, 2812–2831 This journal is © The Royal Society of Chemistry 2014
Tutorial Review Analytical Methods
History of PCA
PCA has been (re-)invented several times. The earliest presen-
tation was in terms of eqn (6).7 This interpretation stresses the
modelling properties of PCA and is very much rooted in
Fig. 10 Exemplifying how explained variation is calculated using the regression-thinking: variation explained by the principal
data and the residuals.
components (Pearson's view). Later, in the thirties, the idea of
taking linear combinations of variables was introduced8 and the
variation of the principal components was stressed (eqn (1);
shows that the (outer-) product tpT serves as a model of X Hotelling's view). This is a more multivariate statistical
(indicated with a hat). In this equation, vector t was a xed approach. Later, it was realized that the two approaches were
regressor and vector p the regression coefficient to be found. It very similar.
can be shown that actually both t and p can be established from Similar, but not the same. There is a fundamental concep-
such an equation5 by solving tual difference between the two approaches, which is important
2 to understand. In the Hotelling approach, the principal
argmin kX tpT k (6)
t;p components are taken seriously in their specic direction. The
rst component explains the most variation, the second
which is also a standard problem in linear algebra and has the
component the second most, etc. This is called the principal
same solution as eqn (2). Note that the solution does not change
axis property: the principal components dene new axes which
if t is premultiplied by a s 0 and simultaneously p is divided by
should be taken seriously and have a meaning. PCA nds these
that same value. This property is called the scaling ambiguity6
principal axes. In contrast, in the Pearson approach it is the
and it can be solved in different ways. In chemometrics, the
subspace, which is important, not the axes as such. The axes
vector p is normalized to length one (||p|| ¼ 1) and in psycho-
merely serve as a basis for this subspace. In the Hotelling
metrics, t is normalized to length one. The vector t is usually
approach, rotating the principal components destroys the
referred to as the score vector (or scores in shorthand) and the
interpretation of these components whereas in the Pearson
vector p is called the loading vector (or loadings in shorthand).
conceptual model rotations merely generate a different basis for
The term ‘principal component’ is not clearly dened and can
the (optimal) subspace.9
mean either the score vector or the loading vector or the
combination. Since the score and loading vectors are closely
tied together it seems logical to reserve the term principal Visualization and interpretation
component for the pair t and p. It is now time to discuss how a PCA model can be visualized.
There are four parts of a PCA model: the data, the scores, the
loadings and the residuals. Visualization of the actual data is
Taking more components oen dependent on the type of data and the traditions of a given
If the percentage of explained variation of eqn (4) is too small, eld. For continuous data such as time-series and spectra, it is
then the t, p combination is not a sufficiently good summarizer
of the data. Eqn (5) suggests an extension by writing
^ þE
X ¼ TPT þ E ¼ t1 pT1 þ . þ tR pTR ¼ X (7)
This journal is © The Royal Society of Chemistry 2014 Anal. Methods, 2014, 6, 2812–2831 | 2817
Analytical Methods Tutorial Review
Fig. 12 Score 1 and 2 from a PCA on the autoscaled wine data. Upper left is a line plot of the 44 score values in component 1 and lower left the 44
score values of component 2. In the right plot, the two scores are plotted against each other.
Fig. 13 Score plot of component 1 versus 2 (top) and 1 versus 13 (bottom). To the left, the plots are shown filling out the squares and to the right,
they are shown preserving distances.
oen feasible to plot the data as curves whereas more discrete important chemical information as to what spectral variation
data are oen plotted in other ways such as bar plots. has not been explained (see also Fig. 23). In short, any visuali-
Visualizing and interpreting residuals. Whatever visualiza- zation that is useful for the data will also be useful for the
tion applies to the data would oen also be useful for e.g. the residuals.
residuals (Fig. 11). The residuals have the same structure and Residuals can also be plotted as histograms or e.g. normal
for example for spectral data, the residuals would literally probability plots in order to see if the residuals are normally
correspond to the residual spectra and therefore provide distributed. Alternatively, the residuals can be used for
2818 | Anal. Methods, 2014, 6, 2812–2831 This journal is © The Royal Society of Chemistry 2014
Tutorial Review Analytical Methods
calculating the explained or unexplained variation as explained rst component represents, which will be described by the
earlier. loadings (see below).
Visualizing and interpreting scores. It is well known that the Instead of plotting the scores in line plots, it is also possible
readings of a variable can be plotted. Imagine that pH is to plot them in scatter plots. In Fig. 12 (right), such a scatter plot
measured on 20 samples. These 20 values can be plotted in a is shown and from the scatter plot it is more readily seen that
multitude of ways. Scores are readings in exactly the same way there seem to be certain groupings in the data. For example, the
as any other variable and can hence be plotted and interpreted Australian and Chilean wines seem to be almost distinctly
in many different ways. In Fig. 12, some visualizations of the different in this score plot. This suggests that it is possible to
rst two components of the PCA model of the wine data are classify a wine using these measured variables. If a new sample
shown. If desired, they can be plotted as line plots as shown in ends up in the middle of the Chilean samples, it is probably not
the le in the gure. This plot of, for example, score 1, shows an Australian wine and vice versa. This possibility of using PCA
that the dark blue scores tend to have negative scores. This for classication forms the basis for the classication method
means that wines from Chile have relatively less of what this called SIMCA (So Independent Modelling of Class
This journal is © The Royal Society of Chemistry 2014 Anal. Methods, 2014, 6, 2812–2831 | 2819
Analytical Methods Tutorial Review
Fig. 18 Scatter plot of the preprocessed values of the variables of two wines (left) and two variables (right).
Fig. 19 Scree plots for simulated rank four data with various levels of noise. Top plots show eigenvalues. Middle plots show the same but zoomed
in on the y-axis to the line indicated in the top plot. Lower plots show the logarithm of the eigenvalues.
Analogies).10,11 The scatter plot can be interpreted in the same can be read from the plots directly. If on the other hand the
way that scatter plots are normally interpreted. For example, a plots are not shown using the same scale on both axis, then
plot of glycerol versus ethanol (Fig. 3) is simple to interpret. assessing distances is not possible.
Samples that are close have similar glycerol and ethanol. Like- Compare the two versions of the two score plots in Fig. 13.
wise, for a scatter plot of component 1 and 2. Samples that are The lower le plot has widely different scales on the two axes
close are similar in terms of what the components represent (because one component has much larger values numerically
which is dened by the loading vectors. Also, if (and only if) the than the other). Henceforth, it is similar to plotting e.g. kilo-
two components represent all or almost all of the variation in metres on one axis and meters on another. A map with such
the data, then e.g. two closely lying samples are similar with axes does not preserve distance. Consider, for example, the wine
respect to the actual data. sample marked A. It seems to be closer to sample C than B in
It is possible to assess similarities and differences among the lower le plot. The plot to the lower right preserves
samples in terms of the raw data. If two components explain all distances and here it is readily veried that sample A is, in fact,
or most of the variation in the data, then a score scatter plot will the closest to B in the space spanned by the two components.
reect distances in terms of the data directly if the scores are There are several points worth mentioning in relation to this.
shown on the same scale. That is, the plot must be shown as Score plots are only indicative of the specic fraction of variance
original scores where the basis is the loading vector. As the they explain. For example, scores that explain three percent do
loading vectors are unit vectors, they reect the original data not imply much with respect to the raw data. To assess relative
and if the two axes in the plot use the same scale, then distances positions such as distances in a score plot, the plot needs to
2820 | Anal. Methods, 2014, 6, 2812–2831 This journal is © The Royal Society of Chemistry 2014
Tutorial Review Analytical Methods
example, the lower le score plot in Fig. 13 is much better for
Fig. 20 Scree plot for the autoscaled wine data. The decision lines for discerning groupings and detecting patterns than the one to the
having eigenvalues larger than one and the broken stick is also shown. lower right.
Visualizing and interpreting loadings. Loadings dene what
a principal component represents. Just as the weight in Fig. 4
dened the latent variable to represent a mixture of glycerol and
ethanol, the loading vector of a PCA model does exactly the
same. It denes what linear combination of the variables a
particular component represents.
Fig. 14 shows the loadings of the two rst components. With
these, it is possible to explain what the scores of the model
represent. For example, wines from Chile have low (negative)
scores for component 2 (Fig. 12). This implies that they have a lot
of the opposite of the phenomenon represented in loading 2.
Hence, these samples have variation where ethanol, total, vola-
tile, and lactic acids are low at the same time (relatively) while e.g.
malic acid is high. Also, and this is an important point, certain
variables that have low loadings close to zero, such as e.g. citric
Fig. 21 Cumulated percentage variation explained. acid, do not follow this trend. Hence, the loading tells about what
the trend is and also which variables are not part of the trend.
The phenomenon reected in the principal component is
also expected to be visually apparent in the raw data, but only
with respect to how much variation of the data this component
describes. The rst component is seen in the label in Fig. 14 to
explain 24.4% of the variation whereas the second one explains
21.3%. Together that means that 45.7% of the variation is
explained by these two components. If the two components had
explained 100%, all information would be contained in these
two components, but for this particular model, half the varia-
tion is still retained in other components, so we should remain
cautious not to claim that observations from the components
are fully indicative of variations in the data.
An example on the importance of this is indicated in Fig. 15.
Fig. 22 Left: score number four of wine data. Right: score two versus
The model reected in the top plot shows that variables 4 and
score four.
6 are perfectly oppositely correlated. The model reected in the
bottom plot does not indicate that. In contrast, the low
percentage explained, indicates that there are many other
preserve distances. This is mostly a problem in practice, when phenomena in the data so the correlation between variable 4
the magnitude of the two components is widely different. The and 6 needs not be close to minus one as it will be in the
score plot that does not preserve distances is still useful. For rst model.
This journal is © The Royal Society of Chemistry 2014 Anal. Methods, 2014, 6, 2812–2831 | 2821
Analytical Methods Tutorial Review
Instead of looking at the loadings in line plots, it is also whole loading matrix to ensure that e.g. the score values are not
feasible to make scatter plots (Fig. 16). The scatter plot is oen so small compared to the loadings that they are not visible in a
helpful for nding patterns of variation. For example, it is plot.
apparent in the plot that volatile acid and lactic acid are There are many interesting aspects of biplots and scatter-
generally correlated in approximately 50% of the variation plots but only a few important interpretational issues will be
reected in the two components. Residual sugar seems to be described here.
only moderately described in these two components as it is Two objects that are close and far from the origin have
close to zero in both components. As the variables have been similar response (with respect to the variation explained by the
auto-scaled, a position close to zero implies that this particular components). For example, the two samples CHI-VDA1 and
variable does not co-vary with the variation that component 1 CHI-SCH1 are far from the origin and close together. Hence they
and 2 is reecting. are expected to be correlated, but only with respect to the
As for the score scatter plot, distances are only preserved in approximately 50% that these two components describe. The
the loading scatter plot, if the two loadings are plotted on the two samples are plotted against each other in Fig. 18 (le). Note,
same scale. The basis for the loadings are the scores and these that it is the preprocessed data that the PCA model reects and
are generally not unit vectors as they carry the variance of the hence, that interpretations can be made about.
components. To correct for that, it is possible to simply Likewise, two variables that are close and far from the origin
normalize the scores and multiply the corresponding loading are correlated (with respect to the variation explained by the
vectors by the inverse normalization factor. In essence, components). An example is given in Fig. 18 (right). Note, that
just moving the variance from the score vector to the loading the high correlation is apparently governed by an extreme
vector. sample – a potential outlier which will be discussed later.
Visualizing and interpreting loadings and scores together – The center of the plot represents the average sample – not
biplots. It is possible and obvious to link the score and the zero – in case the data have been centered. Hence, samples with
loading plot. That way, it is possible to explain why e.g. a certain very negative scores have low values relative to the other
grouping is observed in a score plot. As hinted above, it is samples and samples with high positive scores are the opposite.
difficult to nd a suitable base to plot on when combining Again, with respect to the variation explained by the
scores and loadings, especially if preserving distances is components.
desired. The biplot aims to solve this problem, or rather, pres- The larger projection a sample has on the vector dened by a
ents a suitable set of compromises to choose from. Biplots were given variable, the more that sample deviates from the average
originally developed by K. R. Gabriel, but J. C. Gower has also on that particular variable (see e.g. how sample SOU-HHI1
contributed. The reader is urged to refer to the original litera- projects to the axis dened by the variable lactic acid in Fig. 17).
ture for more in depth information.1214 It is oen overlooked, that the above considerations for
The principle behind biplots can be explained by repre- biplots apply equally well on loading plots or on score plots. Just
senting the PCA model using like above, when for example, loadings are plotted without
considering the magnitude of the scores, distances may be
X ¼ TPT ¼ T(norm)SPT (9) impossible to judge.
2822 | Anal. Methods, 2014, 6, 2812–2831 This journal is © The Royal Society of Chemistry 2014
Tutorial Review Analytical Methods
tting remove various aspects of the simplicity of PCA which is why the following three sections will provide different
(orthogonal scores, nestedness of solutions, etc.). answers to that question.
Exploratory studies. In exploratory studies, there is no
Inference/validation quantitatively well-dened purpose with the analysis. Rather,
Since the PCA model parameters are used for interpretation and the aim is oen to just ‘have a look at the data’. The short
exploration, it is reasonable to ask how stable the results are. answer to how many components to use then is: “just use the
This calls for statistical inference tools. There are different rst few components”. A slightly more involved answer is that in
routes to take in this respect. Upon assuming multivariate exploratory studies, it is quite common not to x the number of
normality of the x-variables, statistical inference for the scores components very accurately. Oen, the interest is in looking at
and loadings are available (see e.g. Anderson,18 pp. 468). the main variation and per denition, the rst components
Multivariate normality cannot always be assumed, but approx- provide information on that. As e.g. component one and three
imate normality of the scores – they are linear combinations – do not change regardless of whether component six or seven is
envoking the Central Limit Theorem can sometimes be done. included, it is oen not too critical to establish the exact
For a distribution-free approach, resampling methods can be number of components. Components are looked at and inter-
used, e.g., bootstrapping. This is, however, not trivial and preted from the rst component and downwards. Each extra
several alternatives exist.19,20 component is less and less interesting as the variation
explained is smaller and smaller, so oen a gradual decline of
Preprocessing interest is attached to components. Note that this approach for
assessing the importance of components is not to be taken too
Oen a PCA performed on the raw data is not very meaningful. literally. There may well be reasons why smaller variations are
In regression analysis, oen an intercept or offset is included important for a specic dataset.29
since it is the deviation from such an offset, which represents If outliers are to be diagnosed with appropriate statistics
the interesting variation. In terms of the prototypical example, (see next section), then, however, it is more important to
the absolute levels of the pH is not that interesting but the establish the number of components to use. For example, the
variation in pH of the different Cabernets is relevant. For PCA to residual will change depending on how many components are
focus on this type of variation it is necessary to mean-center the used, so in order to be able to assess residuals, a reasonable
data. This is simply performed by subtracting from every vari- number of components must be used. There are several ad hoc
able in X the corresponding mean-level. approaches that can be used to determine the number of
Sometimes it is also necessary to think about the scales of components. A selection of methods is offered below, but note
the data. In the wine example, there were measurements of that these methods seldom provide clear-cut and denitive
concentrations and of pH. These are not on the same scales (not answers. Instead, they are oen used in a combined way to get
even in the same units) and to make the variables more an impression on the effective rank of the data.
comparable, the variables are scaled by dividing them by the Eigenvalues and their relation to PCA. Before the methods
corresponding standard deviations. The combined process of are described, it is necessary to explain the relation between
centering and scaling in this way is oen called autoscaling. For PCA and eigenvalues. An eigenvector of a (square) matrix A is
a more detailed account of centering and scaling, see the dened as the nonzero vector z with the following property:
literature.21,22
Centering and scaling are the two most common types of Az ¼ lz (11)
preprocessing and they normally always have to be decided
upon. There are many other types of preprocessing methods Where z is called the eigenvector. If matrix A is symmetric (semi-)
available though. The appropriate preprocessing typically positive denite, then the full eigenvalue decomposition of A
depends on the nature of the data investigated.23–27 becomes:
A basic rationale in PCA is that the informative rank of the data Where Z is an orthogonal matrix and L is a nonzero diagonal
is less than the number of original variables. Hence, it is matrix. In chemometrics, it is customary to work with covari-
possible to replace the original J variables with R (R J) ance or correlation matrices and these are symmetric (semi-)
components and gain a number of benets. The inuence of positive denite. Hence, eqn (12) describes their eigenvalue
noise is minimized as the original variables are replaced with decomposition. Since all eigenvalues of such matrices are
weighted averages,28 and the interpretation and visualization is nonnegative, it is customary to order them from high to low;
greatly aided by having a simpler (fewer variables) view to all the and refer to the rst eigenvalue as the largest one.
variations. Furthermore, the compression of the variation into The singular value decomposition of X (I J) is given by
fewer components can yield statistical benets in further
modelling with the data. Hence, there are many good reasons to X ¼ USVT (13)
use PCA. In order to use PCA, though, it is necessary to be able
to decide on how many components to use. The answer to that Where U is an (I J) orthogonal matrix (UTU ¼ I); S (J J) is a
problem depends a little bit on the purpose of the analysis, diagonal matrix with the nonzero singular values on its
This journal is © The Royal Society of Chemistry 2014 Anal. Methods, 2014, 6, 2812–2831 | 2823
Analytical Methods Tutorial Review
diagonal and V is an (J J) orthogonal matrix (VTV ¼ VVT ¼ I). below one can be real and signicant. Real phenomena can be
This is for the case of I > J, but the other cases follow similarly. small in variation, yet accurate.
Considering XTX and upon using eqn (12) and (13) it follows Broken stick. A more realistic cut off for the eigenvalues is
obtained with the so called broken stick rule.37 A line is added to
XTX ¼ VSTUTUSVT ¼ VS2VT ¼ ZLZT. (14) the scree plot that shows the eigenvalues that would be expected
for random data (the green line in Fig. 22). This line is calculated
This shows the relationship between the singular values and assuming that random data will follow a so-called broken stick
the eigenvalues. The eigenvalue corresponding to a component distribution. The broken stick distribution hypothesizes how
is the same as the squared singular value which again is the random variation will partition and uses the analogy of how the
variation of the particular component. lengths of pieces of a stick will be distributed when broken at
Scree test. The scree test was developed by R. B. Cattell in random places into J pieces.38 It can be shown that for auto-
1966.30 It is based on the assumption that relevant information scaled data, this theoretical distribution can be calculated as
is larger than random noise and that the magnitude of the
variation of random noise seems to level off quite linearly with XJ
1
the number of components. Traditionally, the eigenvalues of br ¼ : (15)
j¼r
j
the cross-product of the preprocessed data, are plotted as a
function of the number of components, and when only noise is
modelled, it is assumed that the eigenvalues are small and As seen in Fig. 20, the broken stick would seem to indicate
decline gradually. In practice, it may be difficult to see this in that three to four components are reasonable.
the plot of eigenvalues due to the huge eigenvalues and oen High fraction of variation explained. If the data measured
the logarithm of the eigenvalues is plotted instead. Both are has e.g. one percent noise, it is expected that PCA will describe
shown in Fig. 19 for a simulated dataset of rank four and with all the variation down to around one percent. Hence, if a two-
various amounts of noise added. It is seen that the eigenvalues component model describes only 50% of the variation and is
level off aer four components, but the details are difficult to otherwise sound, it is probable that more components are
see in the raw eigenvalues unless zoomed in. It is also seen, that needed. On the other hand, if the data are very noisy coming e.g.
the distinction between ‘real’ and noise eigenvalues are difficult from process monitoring or consumer preference mapping and
to discern at high noise levels. has an expected noise fraction of maybe 40%, then an otherwise
For real data, the plots may even be more difficult to use as sound model tting 90% of the variation would imply over-
also exemplied in the original publication of Cattell as well tting and fewer components should be used. Having knowl-
as in many others.3133 Cattell himself admitted that: “Even a test edge on the quality of the data can help in assessing the number
as simple as this requires the acquisition of some art in adminis- of components. In Fig. 21, the variation explained is shown. The
tering it”. This, in fact, is not particular to the scree test but goes plot is equivalent to the eigenvalue plot except it is cumulative
for all methods for selecting the number of components. and on a different scale. For the wine data, the uncertainty is
For the wine data, it is not easy to rmly assess the number different for each variable, and varies from approximately 5 and
of components based on the scree test (Fig. 20). One may argue even up to 50% relative to the variation in the data. This is quite
that seven or maybe nine components seem feasible, but this variable and makes it difficult to estimate how much variation
would imply incorporating components that explain very little should be explained, but most certainly less than 50% would
variation. A more obvious choice would probably be to assess mean that all is not explained and explaining more than, say 90–
three components as suitable based on the scree plot and then 95% of the variation would be meaningless and just modelling
be aware that further components may also contain useful of noise. Therefore, based on variation explained, it is likely that
information. there is more than two but less than, say, seven components.
Eigenvalue below one. If the data is autoscaled, each variable Valid interpretation. As indicated by the results, the different
has a variance of one. If all variables are orthogonal to each rules above seldom agree. This is not as big a problem as it
other, then every component in a PCA model would have an might seem. Quite oen, the only thing needed is to know the
eigenvalue of one since the preprocessed cross-product matrix neighbourhood of how many components are needed. Using
(the correlation matrix) is identity. It is then fair to say, that if a the above methods ‘informally’ and critically, will oen provide
component has an eigenvalue larger than one, it explains vari- that answer. Furthermore, one of the most important strategies
ation of more than one variable. This has led to the rule of for selecting the number of components is to supplement such
selecting all components with eigenvalues exceeding one methods with interpretations of the model. For the current
(see the red line in Fig. 20). It is sometimes also referred to as data, it may be questioned whether e.g. three or four compo-
the Kaisers' rule or Kaiser–Guttmans' rule and many additional nents should be used.
arguments have been provided for this method.3436 While it In Fig. 22, it is shown, that there is distinct structure in the
remains a very ad hoc approach, it is nevertheless a useful rule- scores of component four. For example, the wines from Argen-
of-thumb to get an idea about the complexity of a dataset. For tina all have positive scores. Such a structure or grouping will
the wine data (Fig. 20), the rule suggests that around four or ve not happen accidentally unless unfortunate confounding has
components are reasonable. Note, that for very precise data, it is occurred. Hence, as long as Argentinian wines were not
perfectly possible that even components with eigenvalues far measured separately on a different system or something
2824 | Anal. Methods, 2014, 6, 2812–2831 This journal is © The Royal Society of Chemistry 2014
Tutorial Review Analytical Methods
This journal is © The Royal Society of Chemistry 2014 Anal. Methods, 2014, 6, 2812–2831 | 2825
Analytical Methods Tutorial Review
can reliably improve classication. In order to be certain that such plots have already been shown throughout this paper. It is
useful information is retained in the PCA model, it is generally also important, and frequently forgotten, to look at the pre-
advised to validate the number of components in terms of the processed data. While the raw data are important, they actually
actual goal. Instead of validating the number of components never enter the modeling. It is the preprocessed data that will be
that best describe X in some sense (PCA cross-validation), it will modeled and there can be big differences in the interpretations
oen make more sense to use the number of components that of the raw and the preprocessed data.
provides the best classication results if PCA is used in Score plots. While raw and preprocessed data should always
conjunction with discriminant analysis. be investigated, some types of outliers will be difficult to iden-
tify from there. The PCA model itself can provide further
information. There are two places where outlying behavior will
Detecting outliers show up most evidently: in the scores and in the residuals. It is
Outliers are samples that are somehow disturbing or unusual. appropriate to go through all selected scores and look for
Oen, outliers are downright wrong samples. For example, in samples that have strange behaviour. Oen, it is only compo-
determining the height of persons, ve samples are obtained nent one and two that are investigated but it is necessary to look
([1.78, 1.92, 1.83, 167, 1.87]). The values are in meters but at all the relevant components.
accidentally, the fourth sample has been measured in centi- As for the data, it is a good idea to plot the scores in many
meters. If the sample is not either corrected or removed, the ways, using different combinations of scatter plots, line plots,
subsequent analysis is going to be detrimentally disturbed by histograms, etc. Also, it is oen useful to go through the same
this outlier. Outlier detection is about identifying and handling plot but coloured by all the various types of additional infor-
such samples. An alternative or supplement to outlier handling mation available. This could be any kind of information such as
is the use of robust methods, which will however, not be treated temperature, storage time of sample, operator or any other kind
in detail here. The reader is referred to the literature for more of either qualitative or quantitative information available. For
details on robust methods.4859 the wine data model, it is seen (Fig. 26) that one sample is
This section is mainly going to focus on identifying outliers, behaving differently from the others in score plot one versus two
but understanding the outliers is really the critical aspect. Oen (upper le corner).
outliers are mistakenly taken to mean ‘wrong samples’ and Looking at the loading plot (Fig. 16) indicates that the
nothing could be more wrong! Outliers can be absolutely right, sample must be (relatively) high in volatile and lactic acid and
but e.g. just badly represented. In such a case, the solution is not low in malic acid. This should then be veried in the raw data.
to remove the outlier, but to supplement the data with more of Aer removing this sample, the model is rebuilt and reeval-
the same type. The bottom line is that it is imperative to uated. No more extreme samples are observed in the scores.
understand why a sample is an outlier. This section will give the Before deciding on what to do with an outlier, it is necessary
tools to identify the samples and see in what way they differ. It is to look at how important the component is. Imagine a sample
then up to the data analyst to decide how the outliers should be that is doing an ‘excellent job’ in the rst seven components,
handled. but in the eighth has an outlying behaviour. If that eighth
Data inspection. An oen forgotten, but important, rst step component is very small in terms of variation explained and not
in data analysis is to inspect the raw data. Depending on the the most important for the overall use of the model; then it is
type of data, many kinds of plots can be relevant as already probably not urgent to remove such a sample.
mentioned. For spectral data, line plots may be nice. For Whenever in doubt as to whether to remove an outlier or not,
discrete data, histograms, normal probability plots, or scatter it is oen instructive to compare the models before and aer
plots could be feasible. In short, any kind of visualization that removal. If the interpretation or intended use changes
will help elucidate aspects of the data can be useful. Several dramatically, it indicates that the sample has an extreme
Fig. 26 Score plot of a four component PCA model of the wine data.
2826 | Anal. Methods, 2014, 6, 2812–2831 This journal is © The Royal Society of Chemistry 2014
Tutorial Review Analytical Methods
1
tTi ðTT TÞ ti
Ti 2 ¼ (18)
I 1
Where T is the matrix of scores (I R) from all the calibration
samples and ti is an R 1 vector holding the R scores of the ith
sample. Assuming that the scores are normally distributed,
then condence limits for Ti2 can be assigned as
RðI 1Þ
Ti 2 ðI;RÞ ¼ FR;IR;a (19)
I R
Fig. 28 PCA scores plot (1 vs. 2) for a dataset consisting of ten concentrations of trace elements in obsidian samples from four specific quarries –
data from a study by Kowalski et al.64
This journal is © The Royal Society of Chemistry 2014 Anal. Methods, 2014, 6, 2812–2831 | 2827
Analytical Methods Tutorial Review
Score contribution plots. When a sample has been detected and then aerwards projecting it onto the model thereby
as being an outlier, it is oen interesting to try to investigate the obtaining more objective scores and residuals.
reason. Extreme scores indicate that the sample has high levels Lonely wolfs. Imagine a situation where the samples are
of whatever, the specic component reects in its correspond- constituted by distinct groups rather than one distribution as
ing loading vector. Sometimes, it is difficult to verify directly also exemplied in Fig. 28. Hotelling's T2 is not the most
what is going on and the so-called contribution plot can help. obvious choice for detecting samples that are unusually posi-
There are several different implementations of contribution tioned but not far from the center. A way to detect such samples,
plots65 but one common version was originally developed by is to measure the distance of the sample to the nearest
Nomikos.66 The contribution for a given sample indicates what neighbor. This can also be generalized e.g. to the average
variables caused that sample to get an extreme set of scores. For distance to the k nearest neighbors and various distance
a given set of components (e.g. component one and two in measures can be used if so desired.
Fig. 29), this contribution can be calculated as In Fig. 30, it is seen that colouring the scores by the distance
to the nearest neighbour, highlights that there are, in fact,
X
R
tnew
r xj
new
pjr
cD
j ¼ (20) several samples that are not very close to other samples. When
r¼1
tTr tr =ðI 1Þ the samples are no longer coloured by class as shown in Fig. 28,
it is much less obvious that the green ‘K’ class is indeed a well-
The vector tr is rth score vector from the calibration model, I dened class.
the number of samples in the calibration set and tnew r is the
score of the sample in question. It can come from the calibra-
tion set or be a new sample. xnew j is the data of the sample in
question for variable j and pjr is the corresponding loading
element. In this case, R components are considered, but fewer
components can also be considered by adjusting the summa-
tion in eqn (20).
The contribution plot indicates what variables make the
selected sample have an extreme Hotelling's T2 and in Fig. 29,
the most inuential variables are also the ones that that are
visible in the raw data (not shown). Eqn (20) explains the
simplest case of contribution plots with orthogonal P matrices.
Generalized contributions are available for non-orthogonal
cases.65 Note that if xnewj is a part of the calibration set, it
inuences the model. A more objective measure of whether Fig. 30 Score plot of Fig. 28. Samples are coloured according to the
xnew
j ts the model can be obtained by removing it from the data distance of the sample to the nearest neighbour.
2828 | Anal. Methods, 2014, 6, 2812–2831 This journal is © The Royal Society of Chemistry 2014
Tutorial Review Analytical Methods
References
1 T. Skov, D. Ballabio and R. Bro, Multiblock variance
partitioning: a new approach for comparing variation in
multiple data blocks, Anal. Chim. Acta, 2008, 615, 18–29.
2 D. Ballabio, T. Skov, R. Leardi and R. Bro, Classication of
GC-MS measurements of wines by combining data
dimension reduction and variable selection techniques,
J. Chemom., 2008, 22, 457–463.
3 K. Faber, Degrees of freedom for the residuals of a principal
component analysis — A clarication, Chemometrics and
Chemoinformatics, 2008, vol. 93, pp. 80–86.
4 H. Martens, S. Hassani, E. M. Qannari and A. Kohler,
Degrees of freedom estimation in Principal Component
Analysis and Consensus Principal Component Analysis,
Fig. 31 Influence plot of wine data with a four component PCA model.
Chemom. Intell. Lab. Syst., 2012, 118, 246–259.
5 J. M. F. ten Berge, Least squares optimization in multivariate
analysis, DSWO Press, Leiden, 1993.
Residuals. The use of residuals has already been described in 6 A. K. Smilde, H. C. J. Hoefsloot, H. A. L. Kiers, S. Bijlsma and
detail. For outlier detection, it is common to use the sum H. F. M. Boelens, Sufficient conditions for unique solutions
squared residuals, oen called the Q-statistics, of each sample within a certain class of curve resolution models, J. Chemom.,
to look for samples that are not well-described by the PCA 2001, 15(4), 405–411.
model. When Q is plotted against T2, it is oen referred to as an 7 K. Pearson, On lines and planes of closest t to points in
inuence plot. Note, that both residuals and T2 will change with space, Philos. Mag., 1901, 2, 559–572.
the number of components, so if the number of components are 8 H. Hotelling, Analysis of a complex of statistical variables
not rmly dened, it may be necessary to go back and forth a bit into principal components, J. Educ. Psychol., 1933, 24, 417–
between different numbers of components. 441.
In the inuence plot in Fig. 31, it is clear that one sample 9 J. M. F. ten Berge and H. A. L. Kiers, Are all varieties of PCA
stands out with a high Hotelling's T2 in the PCA model and no the same? A reply to Cadima & Jolliffe, Br. J. Math. Stat.
samples have extraordinarily large residuals. It will hence, be Psychol., 1997, 50(2), 367–368.
reasonable to check the T2 contribution plot of that sample, to 10 S. Wold, C. Albano, W. J. Dunn, III, U. Edlund,
see if an explanation for the extreme behavior can be obtained. K. H. Esbensen, P. Geladi, S. Hellberg, E. Johansson,
The two blue lines are 95% condence levels. Such lines are W. Lindberg and M. Sjöström, Multivariate data analysis in
oen given in soware but should not normally be the focus of chemistry, in Chemometrics. Mathematics and Statistics in
attention as also described above for score plots. Chemistry, ed. B. R. Kowalski, D. Reidel Publishing
Residual contribution plots. Just as contribution plots for Company, Dordrecht, 1984, pp. 17–95.
scores can be dened, contribution plots for residual variation 11 I. E. Frank and S. Lanteri, Classication models:
can be determined as well. These are simpler to dene, as the discriminant analysis, SIMCA, CART, Chemom. Intell. Lab.
contributing factor to a high residual is simply the squared Syst., 1989, 5, 247–256.
residual vector itself. Hence, if a sample shows an extraordinary 12 J. C. Gower, A general theory of Biplots, in Recent Advances in
residual variation, the residual contribution plot (the residuals Descriptive Multivariate Analysis, ed. W. J. Krzanowski,
of the sample) can indicate why the sample has high residual Clarendon Press, Oxford, 1995, pp. 283–303.
variation. The squared residuals do not reveal the sign of the 13 A. Carlier and P. M. Kroonenberg, Decompositions and
deviation and sometimes, the raw residuals are preferred to the biplots in three-way correspondence analysis,
squared ones to allow the sign to be visible.67 Psychometrika, 1996, 61, 355–373.
14 K. R. Gabriel, The biplot graphic display of matrices with
Conclusion application to principal component analysis, Biometrika,
1971, 58, 453–467.
Principal component analysis is a powerful and versatile 15 R. Bro, N. D. Sidiropoulos and A. K. Smilde, Maximum
method capable of providing an overview of complex multivar- likelihood tting using simple least squares algorithms,
iate data. PCA can be used e.g. for revealing relations between J. Chemom., 2002, 16(8–10), 387–400.
variables and relations between samples (e.g. clustering), 16 D. T. Andrews and P. D. Wentzell, Applications of maximum
detecting outliers, nding and quantifying patterns, generating likelihood principal component analysis: incomplete data
This journal is © The Royal Society of Chemistry 2014 Anal. Methods, 2014, 6, 2812–2831 | 2829
Analytical Methods Tutorial Review
sets and calibration transfer, Anal. Chim. Acta, 1997, 350(3), number of factors, Chemom. Intell. Lab. Syst., 1999, 48, 91–
341–352. 97.
17 P. D. Wentzell, D. T. Andrews, D. C. Hamilton, N. M. Faber 34 H. F. Kaiser, The Application of Electronic Computers to
and B. R. Kowalski, Maximum likelihood principal Factor Analysis, Educ. Psychol. Meas., 1960, 20, 141–151.
component analysis, J. Chemom., 1997, 11(4), 339–366. 35 N. Cliff, The Eigenvalues-Greater-Than-One Rule and the
18 T. W. Anderson, An Introduction to Multivariate Statistical Reliability of Components, Psychol. Bull., 1988, 103(2), 276–
Analysis, Wiley, 2nd edn, 1984. 279.
19 M. E. Timmerman, H. A. L. Kiers and A. K. Smilde, 36 L. Guttman, Some necessary conditions for common-factor
Estimating condence intervals for principal component analysis, Psychometrika, 1954, 19(2), 149–161.
loadings: a comparison between the bootstrap and 37 S. Frontier, Étude de la decroissance des valeurs propers
asymptotic results, Br. J. Math. Stat. Psychol., 2007, 60(2), dans une analyze en composantes principales: comparison
295–314. avec le modèle de baton brisé, J. Exp. Mar. Biol. Ecol., 1976,
20 H. Babamoradi and F. van den Berg, Rinnan, Å. Bootstrap 25, 67–75.
based Condence Limits in Principal Component 38 R. H. MacArthur, On the relative abundance of bird species,
Analysis–a case study, Chemom. Intell. Lab. Syst., 2013, 120, Proc. Natl. Acad. Sci. U. S. A., 1957, 43(3), 293–295.
97–105. 39 S. Wold, Cross-validatory estimation of the number of
21 R. A. van den Berg, H. C. J. Hoefsloot, J. A. Westerhuis, components in factor and principal components models,
A. K. Smilde and M. van der Werf, Centering, scaling, and Technometrics, 1978, 20, 397–405.
transformations: improving the biological information 40 W. J. Krzanowski and P. Kline, Cross-validation for choosing
content of metabolomics data, BMC Genomics, 2006, 7(142). the number of important components in principal
22 R. Bro and A. K. Smilde, Centering and scaling in component component analysis, Multivariate Behav. Res., 1995, 30(2),
analysis, J. Chemom., 2003, 17(1), 16–33. 149–165.
23 N. K. Afseth, V. H. Segtnan and J. P. Wold, Raman spectra of 41 R. Bro, K. Kjeldahl, A. K. Smilde and H. A. L. Kiers, Cross-
biological samples: a study of preprocessing methods, Appl. validation of component models: a critical look at current
Spectrosc., 2006, 60(12), 1358–1367. methods, Anal. Bioanal. Chem., 2008, 390, 1241–1251.
24 C. D. Brown, L. Vega-Montoto and P. D. Wentzell, Derivative 42 T. C. Gleason and R. Staelin, A proposal for handling
preprocessing and optimal corrections for baseline dri in missing data, Psychometrika, 1975, 40, 229–252.
multivariate calibration, Appl. Spectrosc., 2000, 54(7), 1055– 43 P. R. C. Nelson, P. A. Taylor and J. F. MacGregor, Missing
1068. data methods in PCA and PLS: score calculations with
25 S. N. Deming, J. A. Palasota and J. M. Nocerino, The incomplete observations, Chemom. Intell. Lab. Syst., 1996,
geometry of multivariate object preprocessing, J. Chemom., 35(1), 45–65.
1993, 7, 393–425. 44 B. Grung and R. Manne, Missing values in principal
26 H. Martens and E. Stark, Extended multiplicative signal component analysis, Chemom. Intell. Lab. Syst., 1998, 42,
correction and spectral interference subtraction: new 125–139.
preprocessing methods for near infrared spectroscopy, 45 B. Walczak and D. L. Massart, Dealing with missing data Part
J. Pharm. Biomed. Anal., 1991, 9(8), 625–635. I, Chemom. Intell. Lab. Syst., 2001, 58(1), 15–27.
27 M. Pardo, G. Niederjaufner, G. Benussi, E. Comini, G. Faglia, 46 E. Adams, B. Walczak, C. Vervaet, P. G. Risha and
G. Sberveglieri, M. Holmberg and I. Lundstrom, Data D. L. Massart, Principal component analysis of dissolution
preprocessing enhances the classication of different data with missing elements, Int. J. Pharm., 2002, 234(1–2),
brands of Espresso coffee with an electronic nose, Sens. 169–178.
Actuators, B, 2000, 69(3), 397–403. 47 J. Mandel, The partitioning of interaction in analysis of
28 R. Bro, Multivariate calibration - What is in chemometrics variance, J. Res. Natl. Bur. Stand., Sect. B, 1969, 73B, 309–328.
for the analytical chemist?, Anal. Chim. Acta, 2003, 500(1– 48 S. J. Devlin, R. Gnanadesikan and J. R. Kettenring, Robust
2), 185–194. estimation of dispersion matrices and principal
29 O. E. de Noord, The inuence of data preprocessing on the components, J. Am. Stat. Assoc., 1981, 76(375), 354–362.
robustness and parsimony of multivariate calibration 49 O. S. Borgen and Å. Ukkelborg, Outlier detection by robust
models, Chemom. Intell. Lab. Syst., 1994, 23, 65–70. alternating regression, Anal. Chim. Acta, 1993, 277, 489–494.
30 R. B. Cattell, The scree test for the number of factors, 50 Y. Xie, J. Wang, Y. Liang, L. Sun, X. Song and R. Yu, Robust
Multivariate Behav. Res., 1966, 1, 245–276. principal component analysis by projection pursuit,
31 P. M. Bentler and K. H. Yuan, Test of linear trend in J. Chemom., 1993, 7, 527–541.
eigenvalues of a covariance matrix with application to data 51 H. Hove, Y. Liang and O. M. Kvalheim, Trimmed object
analysis, Br. J. Math. Stat. Psychol., 1996, 49(2), 299–312. projections: a nonparametric robust latent-structure
32 P. M. Bentler and K. H. Yuan, Tests for linear trend in the decomposition method, Chemom. Intell. Lab. Syst., 1995,
smallest eigenvalues of the correlation matrix, 27, 33–40.
Psychometrika, 1998, 63, 131–144. 52 J. G. Chen, J. A. Bandoni and J. A. Romagnoli, Robust PCA
33 R. C. Henry, E. S. Park and C. Spiegelman, Comparing a new and normal region in multivariate statistical process
algorithm with the classic methods for estimating the monitoring, AIChE J., 1996, 42(12), 3563–3566.
2830 | Anal. Methods, 2014, 6, 2812–2831 This journal is © The Royal Society of Chemistry 2014
Tutorial Review Analytical Methods
53 W. C. Chen, H. Cui and Y. Z. Liang, A new principal 60 H. Hotelling, The generalization of Student's ratio, Ann.
component analysis method based on robust diagnosis, Math. Stat., 1931, 2, 360–378.
Anal. Lett., 1996, 29, 1647–1667. 61 J. E. Jackson, Principal components and factor analysis: part
54 A. Singh, Outliers and robust procedures in some I - principal components, J. Qual. Tech., 1980, 12, 201–213.
chemometric applications, Chemom. Intell. Lab. Syst., 1996, 62 A. M. Mood, F. R. Graybill and D. C. Boes, Introduction to the
33, 75–100. Theory of Statistics, McGraw-Hill, 3rd edn, 1974.
55 E. V. Thomas and N. X. Ge, Development of robust 63 P. Nomikos and J. F. MacGregor, Multivariate SPC charts for
multivariate calibration models, Technometrics, 2000, 42(2), monitoring batch processes, Technometrics, 1995, 37, 41–59.
168–177. 64 B. R. Kowalski, T. F. Schatzki and F. H. Stross, Classication
56 K. A. Hoo, K. J. Tvarlapati, M. J. Piovoso and R. Hajare, A of archaeological artifacts by applying pattern recognition to
method of robust multivariate outlier replacement, trace element data, Anal. Chem., 1972, 44(13), 2176–2180.
Comput. Chem. Eng., 2002, 26(1), 17–39. 65 J. A. Westerhuis, S. P. Gurden and A. K. Smilde, Generalized
57 M. Hubert, P. J. Rousseeuw and K. Vanden Branden, contribution plots in multivariate statistical process
ROBPCA: a new approach to robust principal component monitoring, Chemom. Intell. Lab. Syst., 2000, 51(1), 95–114.
analysis, Technometrics, 2005, 47(1), 64–79. 66 P. Nomikos, Detection and diagnosis of abnormal batch
58 S. F. Møller, J. V. F. Frese and R. Bro, Robust methods for operations based on multiway principal component
multivariate data analysis, J. Chemom., 2005, 19, 549–563. analysis, ISA Trans., 1996, 35, 259–266.
59 P. J. Rousseeuw, M. Debruyne, S. Engelen and M. Hubert, 67 B. M. Wise, N. L. Ricker and D. Veltkamp, Upset and Sensor
Robustness and outlier detection in chemometrics, Crit. Failure Detection in Multivariate Processes, AIChE 1989
Rev. Anal. Chem., 2006, 36(3–4), 221–242. Annual Meeting, Nov. 1989.
This journal is © The Royal Society of Chemistry 2014 Anal. Methods, 2014, 6, 2812–2831 | 2831