PCA 1 Geladi Comprehensive Chemometrics 2020

2.
03 Principal Component Analysisq

Paul Geladi, Department of Biomass Chemistry and Technology, Swedish University of Agricultural Sciences, Umeå, Sweden
Johan Linderholm, Department of Historical, Philosophical and Religious studies, The Environmental Archeology Laboratory (MAL),
Umeå University, Umeå, Sweden
© 2020 Elsevier B.V. All rights reserved.
This is an update of K.H. Esbensen, P. Geladi, 2.13dPrincipal Component Analysis: Concept, Geometrical Interpretation, Mathematical Background,
Algorithms, History, Practice, Editor(s): Steven D. Brown, Romá Tauler, Beata Walczak, Comprehensive Chemometrics, Elsevier, 2009, Pages 211–
226.
2.03.1 Introduction With History 17

2.03.1.1 Introduction 17
2.03.1.2 History 18
2.03.2 Data, Vectors, Matrices and Data Volumes 18
2.03.3 Some Linear Algebra on Vectors and Matrices 20
2.03.4 The PCA Model, Singular Value Decomposition 21
2.03.4.1 Linear Algebra, Mathematical Definitions of PCA, SVD 21
2.03.4.2 PCA in Data Analysis. Useful Information and Residuals 22
2.03.4.3 Number of Components Used A (Rank) 24
2.03.4.4 Properties of Loading and Score Vectors 25
2.03.4.5 Summary and Literature 25
2.03.5 A Comparison to Some Other Methods of Data Analysis 26
2.03.6 Diagnostics and Visualization for PCA 26
2.03.6.1 The Example Data 26
2.03.6.2 Preprocessing 27
2.03.6.3 PCA of the Composition Data 29
2.03.6.4 PCA of the NIR Spectra of the Nine Bread Samples 29
2.03.6.5 Summing UpdPCA of Nine Bread Specimens 36
References 36
2.03.1 Introduction With History

2.03.1.1 Introduction
This article is about Principal Component Analysis (PCA) in Chemometrics.
Principal Component Analysis (PCA) is a frequently used method for multivariate data analysis with the purpose of performing
explorative data analysis, outlier detection, rank (dimensionality) reduction, graphical clustering, classification and many more
actions. A proper understanding of PCA is even a prerequisite for mastering other latent variable methods, including regression
and multivariate calibration anddclassification. Within chemometrics the foremost use of PCA is for visualization of latent data
structures via graphical plots, the interpretation of which opens for a deeper understanding than what is possible looking at indi-
vidual variables alone, because PCA allows interpretations based on all variables simultaneously. PCA is usually the first step of
analysis carried out on a multivariate data set, even though further data analysis using other, more advanced methods may be
required as well. Also, PCA forms the backbone for the often used soft modeling multivariate classification approach known as
SIMCA. PCA is designed to model data which is characterized by a non-trivial correlation between some, most or all of the variables
involved, which is the case for an over-whelming part of scientific data sets (natural sciences, industrial, technological .). The only
important type of data set which does not comply with this prerequisite is an orthogonal experimental design.
PCA requires multivariate data, meaning many variables measured on many objects so the first task is to define data and data sets
in “Data, Vectors, Matrices and Data Volumes” section. Because multivariate data fills vectors and matrices with numbers, a small
amount of introductory linear algebra is needed (Section “Some Linear Algebra on Vectors and Matrices”). Once a data matrix is
obtained, on may apply Singular Value Decomposition (SVD) or PCA on it (Section “The PCA Model, Singular Value Decompo-
sition”). The major goal is to separate the data in a systematic part and a noise part as in Analysis of Variance (ANOVA). In theory
SVD and PCA are identical but in a practical situation, PCA has more varieties than SVD. A few words about algorithms are also
needed in “The PCA Model, Singular Value Decomposition” section. PCA is not the only decomposition method in existence,
so some words on alternative decompositions like Independent Component Analysis (ICA) or Multivariate Curve Resolution
(MCR) should be included and explained shortly (Section “A Comparison to Some Other Methods of Data Analysis”). The
q
Change History: February 2020. P Geladi updated the text, figures and references.
Comprehensive Chemometrics, 2nd edition, Volume 2 https://doi.org/10.1016/B978-0-12-409547-2.14892-9 17

18 Principal Component Analysis
role of ANOVA and noise in PCA was mentioned, so this means that diagnostic tools for judging the meaningfulness of a PCA
model should be explained. Most of the diagnostics benefit from visualization in plots and this is introduced with a small example
(Section “Diagnostics and Visualization for PCA”). The success of a PCA analysis depends strongly on how the data is pretreated
and this depends on variable types where the data come from. Some of this is included in the examples in “Diagnostics and Visu-
alization for PCA” section.
2.03.1.2 History
The Singular Value Decomposition for rectangular matrices and its history are described by Stewart.1 More about the history of PCA
and SVD can be found in Jolliffe2 and a description of PCA for chemometrics was given in Ref. 3.
A simple form of SVD, called eigenvector/eigenvalue decomposition for square matrices was introduced independently by
Arthur Cayley in the UK and William Hamilton in Ireland during the middle 1850s. It is sometimes called the Cayley-Hamilton
decomposition. Later in that decade, Jacques Hadamard in France gave a Cayley-Hamilton decomposition equation for complex
number square matrices. Eugenio Beltrami of Italy and Camille Jordan of France are the originators of the decomposition method
SVD for rectangular matrices in 1873–74. Because calculations had to be made by hand, only small matrices were used in those early
days. One should also mention Karl Pearson’s work in 19014 on ANOVA decomposition of data matrices and an early paper by
Fischer and MacKenzie from 19235 that used SVD on real data from the natural sciences. This is probably the first chemomet-
rics/biometrics paper using multivariate data. An important article is by the American Harold Hotelling in 1933, introducing the
name “Principal Components.”6 Many of these old papers can be found as pdf files for free on the internet and many more historical
articles are mentioned in Refs. 2,3. The appearance of chemometrics happened during the 1970s and much of the early chemomet-
rics work used PCA.7,8
2.03.2 Data, Vectors, Matrices and Data Volumes
Showing vectors and matrices is best done graphically. Fig. 1 shows some definitions. One datum is just a number. Usually these
numbers are collected on many objects and then they form a column vector. Object is a generic name that can stand for samples,
trees, patients, fields etc. A row vector is data for one object. These data can be many things such as temperatures, pressures, densities,
viscosities, mass numbers, wavenumbers etc. The list can be made very long but the generic name is variable. If there are many vari-
ables measured on many objects, then a data matrix has to be formed by putting all row vectors under each other. Fig. 1 shows
a data matrix with I rows and J columns. Each datum in this matrix has coordinates (i,j) with i ¼ 1,.,I and j ¼ 1,.,J. It should
Column Row
Datum, scalar
vector vector
Temperatures, pressures, voltages, wavenumbers, mass numbers, etc
j=1,...,J
Variables J
Samples
Patients
Fields
Planets
Times
etc Data matrix
i=1,...,I
Index i,j
Objects
I
Fig. 1 Definition of a scalar, row and column vectors and the data matrix.
Principal Component Analysis 19
be clear that long vectors or large matrices are hard to study just as collections of numbers. The size of data matrices is different in the
different sciences. In chemometrics, analytical chemistry, instrumentation has largely replaced one-element determinations by titra-
tion, so that spectral data of many variables occurs in XRF, NIR, IR, Raman, NMR, mass spectrometry etc. At the same time, the
number of objects is not always so large, but in process industry with regular measurements over time over long periods also
many objects can be collected.
From the definition of a data matrix it is easy to define even larger structures and some examples are given in Fig. 2. One array is
a data matrix with objects and variables measured over time, giving three ways. These three-way arrays have their own methods of
data analysis.9 The other array is a multivariate or hyperspectral image where each point in the image is made up of a vector of
wavelengths, wavelength bands etc. These arrays are produced very frequently in satellite and airborne imaging but there exists
also laboratory equipment for making such images.10–14 The literature about airborne/satellite imaging is huge and only a few
examples are given here.15–17 For multivariate or hyperspectral images PCA can be useful after rearranging the data.
An important aspect of data analysis to be taken up here is missing data and bad data. Missing data can always occur in any data
matrix. A small amount 1–2% of missing data that is randomly spread is acceptable. Large amounts of missing data or a systematic
spread of the missing data is no good. See Fig. 3. There are methods of imputation, guessing the value that fits in the hole and this
makes the matrix complete. It is also possible to use algorithms that ignore the holes in the calculation. Some references are 18,19.
Another thing to remember is that some manufacturers of instrumentation automatically fill gaps after a measurement which makes
the data matrix look free of gaps.
K Color, wavelength etc

Time
Variable J
Image index vert

Three-way array J
Object
Hyperspectral Image
Image index hor I
Fig. 2 Some three-way arrays that are often encountered.
Fig. 3 Left: A data matrix may have a small amount of randomly spread holes and this is easily remedied. Right: Many holes or too systematic
location of holes is bad, because the remedies will not work properly and give misleading information.
The same as above is valid for outliers, i.e., data points. When some types of outliers are removed they will make holes. PCA is
a very good technique for having a quick look at a larg(er) data matrix and getting an overview. This overview allows finding where
the erroneous or bad data are and once this is remedied a new PCA allows one to see general trends in the data. More about this
comes in later sections.
Data matrices may have extreme shapes. One example is an industrial biochemical process that takes 2 weeks and where temper-
ature, pressure, turbidity and viscosity are measured every 10 min in order to follow the reaction’s progress. This would create a data
matrix of size 2016 4. Another example is an experimental design of 11 runs where each run gives an IR spectrum of 1500 wave-
lengths. That would be a data matrix of 11 1500. In the first case variables were expensive to come by but frequent sampling was
no problem. In the second case it was easy to collect many variables but the samples were expensive to make. In “omics” sciences,
very large data matrices (many objects, many variables) may be obtained.
An important observation is that of the nature of the variables. In the IR spectrum example above, all variables are measured in
the same units and their order is important. Putting the wavelengths in random order would be nonsense for interpreting the results.
In a spectrum, wavelengths with low absorption values are less interesting than ones with high absorption values, so no rescaling is
needed. For the industrial example, the variables are all measured with different equipment and in different units and their order is
not important. Temperatures in degrees C and pressures in Pascal also have quite different numerical values and this means that the
ones with higher numerical values would dominate the analysis and ones with lower values would automatically end up in the
noise. Therefore, these data are usually scaled by their inverse standard deviation.
Some data have mixed variables. Part of the data are all in the same units (spectra, chromatograms) and in a certain order while
others are in different units and order does not matter (temperatures, pressures, viscosities). All these situations require some
thinking and clever scaling and some of this is explained in “Diagnostics and Visualization for PCA” section.
2.03.3 Some Linear Algebra on Vectors and Matrices
PCA and SVD cannot be understood without a minimum of linear algebra and there are good books on the subject.20–22 There are
strict mathematical definitions of the operations of linear algebra but here a simplified graphical definition will suffice. Fig. 4 shows
some important operations. For addition or subtraction, all matrices should be of the same size. Addition or subtraction is done
elementwise. The figure also shows the multiplication by a scalar. This means that every element in the matrix is multiplied by
the scalar. Per element the operation is:
vij ¼ a x ij þ b yij þ c zij do this for all i and all j (1)
J J J J
V = +/-a* X +/-b* Y +/-c* Z
I I I I
1 1 1 1
v = +/-a* x +/-b* y +/-c* z
Fig. 4 Upper: Addition/subtraction and scalar multiplication of matrices of equal size. V, X, Y, Z are matrices, a, b and c are scalars. Lower: The
same operation for the column vectors v, x, y, z.
Fig. 4 also shows some nomenclature. A scalar (a, b, c) is presented by a lowercase character, a matrix (V, X, Y, Z) is an uppercase
boldface character. Vectors (v, x, y, z) are just matrices where the number of rows or columns is one. They are given by lowercase
boldface characters.
If one defines per definition that all vectors are column vectors, the transposition is needed to make a row vector from a column
vector. The superscript T is used for showing this: x becomes xT, x transposed. Vectors and matrices can be multiplied by the dot
product. Fig. 5 left shows this. A row vector xT multiplied by a column vector y becomes a scalar a: xT.y ¼ a. If the two vectors
are identical, then this scalar is the Sum of Squares (SS) of the vector: xT.x ¼ a ¼ SS(x). In Fig. 5 right is another product of two
vectors giving a matrix: y.xT ¼ W. This is the outer product. Fig. 6 gives something similar, but now for matrices.
With the matrix/vector operations in Figs. 4 and 5 the PCA equation can be constructed in the next section.
2.03.4 The PCA Model, Singular Value Decomposition

2.03.4.1 Linear Algebra, Mathematical Definitions of PCA, SVD
PCA is an equation describing the decomposition of a matrix X into parts consisting of vectors with special properties and a residual
E of zeroes. The equation for PCA is:
X ¼ TPT þ E ¼ t 1 p1 T þ t 2 p2 T þ / þ t A pA T (2)
J 1 1
xT • = a
1
1
y
J J
1
• xT
1
J
y = W
K
K
Fig. 5 Multiplication of vectors. Left: The inner product of two vectors of the same size. Right: The outer product of two vectors of different size.
I
J J
PT =
T
X
K K
Fig. 6 An outer product of two matrices. Each point xkj in the matrix X is an inner product over I elements: xkj ¼ sum over all i (tki pij ). The multi-
plication symbol can be left out if it is clear what operation is meant.
The equation for SVD is:

X ¼ USV T þ E ¼ u1 s1 v1 T þ u2 s2 v2 T þ . þ uA sA vA T (3)
The equations are very similar. Replacing u1s1 by t1 and v1 by p1 is easily done, also for indices 2 to A. From now on only the first
equation will be used. Some nomenclature is needed:
X: an I (objects) xK (variables) matrix.
ta: a score vector a ¼ 1,.,A in size order. These mutually orthogonal (see Section “Properties of Loading and Score Vectors”)
column vectors form the matrix T.
pa: a loading vector a ¼ 1,.,A with property pTa pa ¼ 1. These mutually orthonormal (Section “Properties of Loading and Score
Vectors”) column vectors form the matrix P.
va: a loading vector a ¼ 1,.,A with property vTa va ¼ 1. These orthonormal (Section “Properties of Loading and Score Vectors”)
column vectors form the matrix V.
ua: a score vector a ¼ 1,.,A with property uTa ua [ 1. These orthonormal (Section “Properties of Loading and Score Vectors”)
column vectors form the matrix U.
sa: a singular value a ¼ 1,.,A in size order. These are the elements of the diagonal matrix S.
A: the maximum number of components, the smallest of the pair (I,K), called rank.
2.03.4.2 PCA in Data Analysis. Useful Information and Residuals

The PCA equation (Eq. 1) is shown again, see also Fig. 7:
X ¼ TPT þ E ¼ t1 p1 T þ t 2 p2 T þ / þ t A pA T þ E (4)
or almost always
X ¼ 1mT þ t 1 p1 T þ t 2 p2 T þ / þ t A pA T þ E (5)
One might argue that the column wise means are used as a first component. In most cases it is easier to write:
X mc ¼ X 1mT ¼ t 1 p1 T þ t 2 p2 T þ / þ t A pA T þ E (6)
With 1 a vector of ones and m the vector of columnwise means. Almost always, Xmc the variable-wise mean centered version of X
is used, so Eq. (6) is the only one to remember in chemometrics (Figs. 8 and 9). In Eqs. (3)–(5), A is smaller than the maximum
possible value and a residual matrix E is introduced. E contains noise or irrelevant systematic information and the components
should contain the relevant systematic information. The formulation in Eqs. (5) and (6) (subtraction of column wise mean) forces
part of the data to become negative but this is no problem as will be shown in the example in “A Comparison to Some Other
Methods of Data Analysis” section.
The geometrical interpretation of principal component axes in space is as follows. The calculation of a first principal component
gives a direction (line) in space along which the sum of squares of all objects projected on that direction is maximized. The sum of
squares of the distances to the line is minimized. See Fig. 10. The projections of all objects on the line are the score values. The
loading vector contains the directional cosines. The same thing goes for the second component after the information from the first
component, t 1 p1 T , is removed from the original data set, X, and so on. The first and second component directions form a plane of
maximized sum of squares. The objects can be projected onto that plane. This goes on for the third, fourth etc. component until all
necessary components are calculated. This what Pearson’s article from 19014 explains. Fig. 10 becomes more complicated for more
variables and components, but the principle is the same.
A
K K K
T
P
X = T + E
I I I
Fig. 7 The PCA equation. If A is large enough (here A ¼ K), E becomes a matrix of zeroes and can be left out.
K A A K K
T
S V
A A
X = U + E
I I I
Fig. 8 The SVD equation. If A is large enough (here A ¼ K), E becomes a matrix of zeroes and can be left out. S is a diagonal matrix with of
singular values on the diagonal. This is the same as Fig. 7 by setting T ¼ US.
p1T p2T p3T
Xmc = t1 + t2 + t3 + E
Fig. 9 The PCA equation for three components. Notes: Multiplication symbols are left out, vectors are represented by lines, not rectangles as
before, sizes are not given.
Var2
a2
0
a1
0 Var1
Fig. 10 In a scatter plot of two mean-centered variables, a principal component is the line that maximizes the sum of squares of the projections on
the line, the scores. The distances to the line (dashed arrows) are minimized in the sum of squares sense. The cosines of the angles a1 and a2 are
the elements of the first loading vector p1.
The algorithms used to calculate principal component models are of two kinds: those that calculate all scores and loadings at
once using an SVD like calculation and those that calculate the components one by one and stop when the desired number of
components is reached. One algorithm used for this is called NIPALS (Nonlinear Iterative PArtial Least Squares). Extended literature
examples on these matters are found in the reference list.23–32
For large data sets where only a few components are needed, the one by one calculation may be the only doable
alternative.
In the sum of squares sense, the sum of squares (SS) of X,SSX, is the sum of the SS of all principal components (SS1, SS2 and SS3
in the equation below for a model with three principal components) and the SS of the residual, SSE. The numerical example makes
this clear. The first component explains as much as possible of the SS of X. The second component explains the maximum SS when
SS1 is removed, etc.:
SSx ¼ SS1 þ SS2 þ SS3 þ SSE (SSeq, here for only three components, but can be generalized to any number).
100 ¼ 62 þ 25 þ 9 þ 4.
100% ¼ 62% þ 25% þ 9% þ 4%.
The sum of squares (SS) of X is the sum of the SS of all components and the SS of the residual. The numerical example makes this
clear. The first component explains as much as possible of the SS of X. The second component explains the maximum SS when SS1 is
removed etc.
Sum of squares values may be shown in an ANOVA table. Such an ANOVA table could be used for testing each component
against the residual in an F-test. More about choosing A is given in the next subsection and in the diagnostics section.
2.03.4.3 Number of Components Used A (Rank)

With decomposition methods such as PCA, nothing is more important than knowing or finding out how many components are
needed, called the rank in this article. There is a difference between the purely mathematical interpretation of the PCA(SVD) equa-
tions and data analysis:
- by making A large enough, E becomes a matrix of zeroes. This may make mathematical sense, but it is not used in data analysis. In
data analysis, only a limited number of components is extracted until the residual E (SSE) has the size of measurement noise.
- very often 8–12 components form the maximum useful amount. If more components are needed, then PCA probably was not the
best choice of method.
- for a first screening analysis, (finding outliers, coding errors etc.) A is usually 2 or 3.
- a final PCA model uses whatever components are deemed necessary. Here it is often the experience of the analytical chemist,
supported on the knowledge of the magnitude of the experimental error and the interpretability of scores and loadings that
decides on the final size of A.
- as a rule of thumb, components with an SS of less than 1% (see Table 1) of the total are, in general, not so important and noisy.
- for large data matrices it makes numerical sense to calculate only a few components and not to calculate all of them. This means
calculating the components one by one and stopping when A is large enough (meaning SSE is small enough) instead of
calculating hundreds of meaningless components (which takes time and memory on a computer).
- the data in X are usually transformed for measured data. This is almost always variable-wise mean-centering. Different scalings
are also used.
- with variables in different units, also variable-wise scaling (division of each column by its standard deviation) of X is used.
- other transformations usually depend on the nature of the spectrometer used or on experience of the analytical chemist.
Sometimes taking exponents, square roots or logarithms is an improvement because it may remove skewness in the data.
- many more scaling methods may become useful in PCA. They usually depend on the source of the data and are described in
“Diagnostics and Visualization for PCA” section. With spectral data, normalization, smoothing or calculating pseudo deriva-
tives may improve things.
An important diagnostic to estimate the number of components needed in a PCA model is R2 or R-squared. It is usually shown as
a cumulative percentage of the total sum of squares of X. For the example in Table 1, the R2, related to the percentage of variation
explained of X, is 96% using three components.
Because many algorithms allow for missing values in the data matrix, it is customary to leave empty cells and predict the values
in these empty cells. By repeating this strategy, making PCA models based on row subsets of matrix X and predicting the rows left
outside the model, a cross-validated R2 is obtained called Q2 or R2cv. This is illustrated in Fig. 11. It is also possible to plot residuals
(100-R2) and (100-Q2). Another plot that is often made is the eigenvalues of the SVD equation as a function of the number of
components. That one is called the scree plot (this is also exemplified in “PCA of the NIR Spectra of the Nine Bread Samples”
section, Fig. 15). In the scree plot, eigenvalues related to relevant information of the data set are significantly higher than noise-
Table 1 An ANOVA table of a PCA analysis.
Sum of squares Value of SS Degrees of freedom
SS1 62 1
SS2 25 1
SS3 9 1
SSE 4 I-4
SSx 100 I-1
R2 or Q2
100% R2
Q2
50%
Comp nr
1 2 3 4 5 6 7 8
Fig. 11 R2 and Q2 as a function of the number of components. R2 (circles) always increases as a function of the number of components. The
cross validation based Q2 (triangles) decreases or stagnates when the systematic information is used up and the components become noisy.
related eigenvalues, small and similar among them. Graphically, the point in which noise-related eigenvalues start appearing can be
detected with the ‘broken stick’ approach (see Fig. 15).
Most software programmers have their own favorites, so not all software allows all plots. In Fig. 11, Q2 goes down after four
components and R2 only increases slowly from 4, so 4 components, a rank of 4 would be good enough. Diagnostics like R2
and Q2 are based only on variation explained. They have to be supplemented with a study of the obtained scores and loadings
and a judgment whether these make sense.
2.03.4.4 Properties of Loading and Score Vectors

All loading vectors are orthonormal:
pi T pj ¼ 1 if i ¼ j (7)
or pi T pj ¼ 0 if NOT i ¼ j (8)
All score vectors are orthogonal:

t i T t j ¼ SSi if i ¼ j (9)
or t i T t j ¼ 0 if NOT i ¼ j (10)
The SSi are the ones in Table 1.

A practical implication is that some values in some of the vectors ti and pi (i ¼ 1,.,A) are forced to become negative, but concen-
trations are never negative and spectral values are never negative. They may become zero, but go not below that. This may make
scores and loadings counterintuitive but in the visualization section it is shown how scores and loadings are still useful when visu-
alized properly.
There is also a sign ambiguity because t a pa T is the same as ð t a Þ pa T in the PCA equation. A small rounding error can make
the signs flip. One has to be aware of that.
2.03.4.5 Summary and Literature

Principal component analysis for the analysis of chemical or analytical data can be considered as many things:
- an ANOVA decomposition of the data in components leaving a residual.
- a geometrical fitting in multivariate space creating lines, planes, volumes and hypervolumes.
- a decomposition in orthogonal scores and loadings with a possible sign ambiguity.
- a visualization tool in hyperspace.

- a flexible opportunity to explore the information of the original data set when subject to different kinds of preprocessing.
This makes PCA a very flexible and informative technique.
Quite a number of books have been written with chapters on PCA or SVD.23–29 There are also complete books on PCA.2,30–32
The book by Joliffe from 1986 is the oldest pure PCA book. Some older books are out of print but may still be found in libraries.
2.03.5 A Comparison to Some Other Methods of Data Analysis
There are some 10–20 named decompositions in the literature with confusing names and many of them are very similar to PCA or
can be made identical to PCA. Others have unique properties different from those of PCA. Only Multivariate curve resolution
(MCR) and Independent Component Analysis (ICA) are mentioned here. There is also a short mention of the Soft Independent
Modeling of Class Analogies (SIMCA) method as a popular application of PCA.
An important decomposition is “Multivariate Curve Resolution” or “Non-Negative factor Analysis” or “Self Modeling Curve
Resolution” and there are many such names. Outside chemistry there are even more different names for the same decomposition.
It is good to be aware of that.
Multivariate Curve Resolution (MCR) can be defined as a bilinear decomposition method that uses constraints related to chem-
ically meaningful properties of the profiles sought. In chemistry it may refer to situations where data cannot be negative. There exist
no negative concentrations, spectra or chromatograms. This means that principal component loadings and scores, that by definition
have negative parts, would give unrealistic views of the data. The MCR equation for three components is:
X ¼ c1 s1 T þ c2 s2 T þ c3 s3 T þ E (11)
where c1, c2 and c3 are vectors of concentrations of three chemicals and s1, s2 and s3 are their spectra for concentration one or
normalized spectra. The vectors in Eq. (11) have no negative elements. E is a residual describing measurement noise. The spectra of
the mixtures are in X and only mixtures are available. The goal of the decomposition is to find the pure spectra and the corre-
sponding concentrations. The oldest reference is Lawton and Sylvestre from 1971.33
The concentrations and spectral values cannot become negative, so even though the equation above resembles that of PCA in
“The PCA Model, Singular Value Decomposition” section, the solution, the decomposition of X, must be a different one. It is also
not possible to mean-center, because this would introduce negative values artificially.
One of the properties of MCR that is different from PCA is that all components change if a component is added or removed.
Finding out how many components are needed is therefore very important. Sometimes PCA is used for a preliminary analysis
for finding how many components would be good to use in a factor analysis model. Some good explanations can be found on
the internet. Some good review articles are Refs. 34–37.
Independent component analysis38 is based on the source separation problem. This is also referred to as the “cocktail party
problem” meaning to separate the voice of each speaker from the sound mixture that is observed. The equation for three compo-
nents is:
X ¼ a1 s1 T þ a2 s2 T þ a3 s3 T þ E (12)
Here [s1 s2 s3] are the independent signal profiles and [a1 a2 a3] are the mixing proportions. The same equation as for MCR and PCA
is back. The only difference is on how the vectors are calculated. ICA does this by imposing statistical independence and non-
normality of the vectors sa. There is not an ANOVA decomposition (as in PCA) in ICA and no orthogonal scores-loadings are
calculated.
ICA can just like PCA be used as a first decomposition to guess the rank in MCR. According to some authors, ICA is closer to MCR
than to PCA.39,40 A good comparison of PCA, ICA and MCR is given in Ref. 41.
Except in very special (synthetic) cases PCA (SVD) always converges to the same solution except for sign flipping. In MCR and
ICA this is not always the case and the role and interpretation of the residual are not so straightforward.
PCA is often used in classification. The SIMCA method42 should be mentioned. Basically, after definition of two or more classes
of objects, local PCA models are made for each class. These models may be of a different rank. The classification then exists of taking
new objects and determining to which local model they fit the closest. This way PCA is also used in popular classification methods
described in many chemometrics books.26,28,29,43,44
2.03.6 Diagnostics and Visualization for PCA

2.03.6.1 The Example Data
The aim of a PCA is to give a better understanding to a data set and enable a more comprehensive interpretation, that otherwise
would be lost. This means that PCA models are not primarily statistical exercises. Something to keep in mind is that although
any numbers can be introduced in to a PCA analysis, we need an understanding of the qualitative aspects and properties of the data.
Table 2 Composition and names of the nine crisp bread samples.
Number Basic flour content Energy KJ Usfat g Satfat g CarbH g Sugar g Fiber g Protein g Salt g Price/Kg units
KN1 Rye 1377 1.1 0.4 57.8 1.5 20 9 1.5 47

KN2 Wheat/Rye 1588 7 1 55.4 1 14 13 1.2 90
KN3 Wheat/Spelt 1669 9.3 1 55.4 0.7 12 14 1.13 100
KN4 Rye 1633 10 2 46.5 2 20 12 1.25 80
KN5 Rye 1400 1.5 0.5 59.7 0.3 19 10 1.2 51
KN6 Rye 1407 1.6 0.4 58.5 1.5 20 9 1 33
KN7 Rye 1456 1.3 0.4 59 2 19 9 0.9 80
KN8 Rye 1405 1.2 0.3 58 2 22 9 0.88 50
KN9 Rye 1400 2 0.5 58.5 0.5 18 11 1.15 56
All values are given per 100 g of bread. The variable names are abbreviations: Usfatdunsaturated fat, Satfatdsaturated fat and CarbHdcarbohydrates without sugar.
From this point on a simple example is introduced. Nine packages of Swedish crispbread were purchased in a supermarket. The
names and nutritional composition found on the packages are given in Table 2.
A number of interesting data analytical observations can be made:
• not all variables are in the same units

• some variables are large numbers and some are small numbers, so some scaling will be needed
• the table is large enough to lose overview, so a PCA analysis may become helpful
Fig. 12A and B shows the nine specimens (KN1-KN9) and their very different appearances; differences in structure, homogeneity,
sprinkles, colors etc. that all pose sampling challenges. The way in which data is collected will obviously be influencing the analyt-
ical data outcome and thus the possibility of analyzing a robust PCA model.
2.03.6.2 Preprocessing
Before conducting any PCA, a thorough consideration on the nature of the data to be modeled is needed. A useful PCA model
depends on how the data are preprocessed. Many instruments that generate multivariate data have their own idiosyncrasies that
require pretreatment of the data and these pretreatments are not absolute. A rule of thumb is that “Less is more” and the less pre-
processing is done, the lower will be the risk of introducing bias. On the other hand, lack of proper preprocessing will lead to non-
optimal analysis of data and loss of potential information.
One of the nice properties of PCA is that there are no non-negativity demands so that many preprocessing methods are available.
It is impossible to describe all preprocessing methods available in commercial software, but a few need to be mentioned. As
shown in Eqs. (5) and (6) it is always possible to subtract the variable-wise mean. This is almost always done. The result is that
the first component does not describe the mean. Sometimes median subtraction is used.
A second important decision is whether or not to scale the variables. Some reasons to do this or not.
are:
1. all variables are in different units. In this case, rescaling is needed because just changing a unit could make a variable extremely
important or not at all important. Very often the variables values are divided by the standard deviation. This is called Unit
Variance (UV) scaling. This is the case for the variables in “PCA of the Composition Data” section, Fig. 13 below.
2. all variables are in the same units. For a lot of spectral data, small values mean noisy data and large values mean important data.
Then rescaling to bring all variables on the same footing is not a good thing to do because it blows up the noise. For the spectral
data in “PCA of the NIR Spectra of the Nine Bread Samples” section (Fig. 17) this is the case.
3. sometimes data come in blocks with different units per block. In that case, proportional scaling of all blocks (block scaling) makes
sense. In that way all blocks have the same chance of influencing the model. It may also be necessary to scale inside the blocks first.
The three situations above are linear scaling. Each variable is multiplied by a constant. Sometimes nonlinear scaling is needed.
Taking logarithms or powers (e.g., ½ ¼ square root) may be needed for some data to make their distribution less skewed. No matter
what scaling is used, removing the mean afterwards is always wise.
All scaling mentioned up to now was done variable-wise, but scaling may also be needed object-wise. This is especially true for
spectral data where the position of a variable between its neighbors is important. In “PCA of the NIR Spectra of the Nine Bread
Samples” section, there is an example presented of why object-wise scaling of spectral data is needed and how it is carried out.
1. For spectral data the physical nature of the data has to be considered, especially for solids where penetration depth is an
important factor. Absorption, transmission, reflection, emission, fluorescence intensity etc. may have to be transformed into
each other to give useful results.
2. Besides that, object-wise centring and scaling may become useful. There may also be a need for smoothing or calculating pseudo-
derivatives of the spectra.
Fig. 12 (A) Front side of selected specimens (top left KN1; bottom right KN9). (B) Backside of selected specimens.
Raw data Mean Centered data Mean Centered and UV-scaled data
2000 200 3
1600 2
100
1
1200
0 0
800
-1
400 -100
-2
0 -200 -3
Fig. 13 Raw data, after centering, after centering and UV scaling respectively.
3. Sometimes a complete transformation into Fourier or Wavelet coefficients gives the desired results.
Some definitions are found in the chemometrics books.28,29,43–45
2.03.6.3 PCA of the Composition Data

The data matrix in Table 2 is a 9 9 matrix. In Table 2, we are dealing with quantified ratio data (a true zero value exists). In this
case, mean centering and UV-scaling is necessary to minimize effects of differences in data range etc. Data have to be mean-centered
and scaled by the standard deviation. In Fig. 13, we can graphically follow the transformation of data from its raw state to mean
centering with UV-scaling. When UV-scaling is done emphasis is oriented to correlation between variables and not on variance-
covariance-range. To put this in a very simple way this means that we can compare apples and pears.
This gives a new matrix with all column means zero and all column standard deviation one. This way none of the variables will
dominate simply by its size or unit used. When we perform a PCA on this data matrix in this transformed state the outcome is pre-
sented in Table 3. We can see that four components explain 98% of the variation in the data matrix.
In this data set (nine objects and nine variables) the theoretical maximum of components (after mean-centring) is 8. However, the
higher components explain very little variation and are just noise. In order to address this issue in a dataset like this, the Table 3 gives
the answer. As the four components explain 98% of the total variation, it is very likely that the remaining variation represents noise.
Now this model has produced scores and loadings for evaluation. A good way to get an overview on the PCA results is to plot the
Scores and Loadings for the first and second components. One way of doing this is to use a Bi-Plot (Fig. 14A) where we combine
normalized scores (u) and loadings (p) in the same plot. For both score and loading plots, points close to the origin are not impor-
tant for the components plotted. This means objects in the score plot and variables in the loading plot that do not contribute much
to the plotted dimensions. In the biplot, normalized scores and loading are given the same origin and also here variables or objects
close to the origin are not or less important. For objects, just as in the score plot, one may see correlated groups and single objects.
The groups or objects may be important in one dimension or the other one or in both. Sometimes objects (or groups) may be anti-
correlated in space. The importance of variables in the loading plot can be seen as far from the origin (important variable) and in
direction from the origin (contributing to one PCA dimension or both). Also, anticorrelation may sometimes be seen as variables
lying opposite the origin. In the biplot, when an object group lies far from origin in a certain direction, variables pointing in the
same direction and far from the origin are important for explaining the properties of the objects in question. Even more refined
interpretations are possible, but they cannot be done automatically. They require background knowledge on how the objects
were obtained and how the variables were measured.
In Fig. 14B, a Biplot including scores and loadings in the same plot allows observing the relation between objects and variables.
Some “clusters” appear and looking at the first PC, three objects are linked to fat-protein-price (KN2, KN3 and KN4) and another
three, similar among them, are more related to carbohydrates (KN1, KN5 and KN9). Three objects, KN6, KN7 and KN8 (names in
Table 2), display more correlation to the sugar-fiber variables.
Conclusions we can draw from this plot is that price and fat-protein content are strongly correlated and that carbohydrates and
fibers come cheap. In the second PC, we can see that salt- and sugar content are negatively correlated.
The third and fourth component also give some information but the number of objects is too low for any deeper interpretations.
2.03.6.4 PCA of the NIR Spectra of the Nine Bread Samples

Going back to Fig. 12A and B, one may observe the physical differences between the objects. This means that acquired spectra need
some consideration. Obviously, one spectrum per specimen will not be representative and thus average spectra will give results that
are more robust. However, it may be motivated to analyze individual spectra in a case like this if only to show the potential
complexity of the sampling procedure.
A NIR probe was used to acquire five spectra from each side of the bread specimens. NIR spectra on the bread were collected
using a Viavi-MicroNIR instrument (900–1650 nm range and 125 increment wavelengths) but for practical analytical reasons
(noise) only 109 wavelengths are used in this model (1000–1650 nm). Preprocessing was used as described in Fig. 17.
The first data set consists of the average of five measurements (raw spectral data) on the front and back respectively, resulting in
a matrix of 18 (2 9) objects and 109 variables (mean centered).
Table 3 Data set composed by nine objects nine variables.
PCA model PC no R2X R2X cum
1 0.60 0.60
2 0.17 0.77
3 0.15 0.92
4 0.06 0.98
Data preprocessing: Centering and UV-scaling.

(A)
Stars represent objects Circles represent variables
u2 0,0 p2 0,0
u1 p1
u2/p2 0,0
(B)
u1/p1
0.8
CH sugar g
0.7
0.6
0.5
t[2] (Normalized)/p[2] (17.3%)
0.4
0.3
Price/Kg Fiber g LAB ID
0.2 Energy KJ KN1
0.1 Fat Sat g
KN2
Fat US g KN3
0
KN4
–0.1 KN5
Prot g
CH g KN6
–0.2
KN7
–0.3 KN8
KN9
–0.4
–0.5 NaCl g
–0.6
–0.45 –0.4 –0.35 –0.3 –0.25 –0.2 –0.15 –0.1 –0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
t[1] (Normalized)/p[1] (60.5%)

Fig. 14 (A) A Bi-Plot where score and loading values are combined into one composite graph. Here the geometrical relationship between the scores
and loadings becomes evident. (B) Bi-Plot (score and loading) based on specified content of the nine crisp bread specimens according to Table 2.
12.5
Component size (% SS)

10
7.5
2.5
1 2 3 4 5 6 7 8 9
Component number (a)
Fig. 15 Broken stick (scree plot) method for selecting the appropriate number of components. The red line shows SS per component and the
dashed line shows the broken stick principle.
The number of relevant components to extract is diagnosed by using the “broken stick” approach (Fig. 15) (see also “Number of
Components Used A (Rank)” section). Here we can observe a gradient of diminishing influence of the score values in the PC
model, presented as Eigenvalues (score variance). The stick is “broken” at the third or fourth eigenvalue meaning that higher compo-
nents are of less importance and contain too much noise. This is one very simple but often effective way of understanding the
number relevant components.
As we are now dealing with NIR spectra there are chemical and sampling issues to be observed. So, any clustering in the data will
be related to this.
In Fig. 16, the first and second PC from this analysis are plotted. Here, three “clusters” can be identified. One cluster of seven
objects are found in the lower left corner of the figure. These specimens are all “wheat contaminated” rye based products. In the
lower right corner, there is a cluster of “pure” rye products. In the strong positive end of the 2nd component, the exotic sprinkled
side of some of the specimens seem to have a very strong influence.
Fig. 16 Score plot presenting the 9 raw bread specimens (back- and front side, 18 objects) and with information added on additives (sprinkles). In
this case, the average values on the five NIR spectral measurements on each bread side have only been preprocessed by mean centering.
Table 4 Additional characteristics to crisp bread specimen and distance in PC model between front and back.
LAB ID Base cerealia Flakes Seeds sprinkles Distance between back to front
KN1 Rye rye (mash) close

KN2 Wheat/Rye Eincorn Quinoa medium
KN3 Wheat/Spelt Chia - Quinoa far
KN4 Rye Wheat sesame (black-white) far
KN5 Rye far
KN6 Rye far
KN7 Rye wheat-rye-spelt medium
KN8 Rye close
KN9 Rye oats close
However, the front and back are in some cases very similar and in other very different, all depending on the treatment and sprin-
kles added to the bread. In Table 4, visually determined distance between the same specimen front and back is shown. In this case,
the NIR spectra picks up on chemical differences not visible to the naked eye. Looking back at the pictures in Fig. 12, these differ-
ences are clearly not apparent.
As seen from the section and cases above, it is complicated to acquire representative spectra from heterogenous materials as the
specimens. Because of this, the specimens were ground with a mortar and pestle and the powder produced was reanalyzed with the
NIR-Probe. This grinding and homogenization of the samples will affect the collection of data in several ways because homogenous
material, seed oil and core materials will be included in the matrix and optical surface reflections will be reduced.
As with the examples in previous section we need to preprocess the data in order to improve the information exchange and reduce
optical scattering etc. What we need to do is something that enables us to retain the chemical information of these spectra. In Fig. 17, the
effects of pretreatment procedure can be overviewed graphically. In this figure, (a) and (b) reflects the physical optical effects that masks
the chemical information. Here we can follow the different stages of preprocessing where in (a) represents the raw NIR spectra, (b) off-set
(A)
0.8
0.6
0.4
0.2
–0.2
0 20 40 60 80 100 120 140
Fig. 17 NIR spectra (A) Raw, (B) off-set corrected, (C) slope corrected. Horizontal axis is variable number (wavelength) and vertical axis is the
absorbance values.
(B)
0.3
0.2
0.1
–0.1
–0.2
(C) 0 20 40 60 80 100 120 140
2
1.5
0.5
–0.05
–1
–1.5
0 20 40 60 80 100 120 140
Fig. 17 (Continued).
Fig. 18 Score values (t1-t2) on average spectra of the 9 specimens, this time on homogenised samples of crisp bread. Here we have used SNV and
mean centering to reduce scattering effects.
corrected eliminate effects of surface smoothness and -reflections, (c) slope corrected to minimise influence of particle size. Thus this
procedure from (a) to (c) represents SNV correction. This preprocessing of the spectra retains mainly chemical information in (c).
If a new PC model is calculated based on this new data set (109 wavelengths and 9 objects, mean centering and SNV processed),
the PC1 and PC2 plot can be seen in Fig. 18.
So what would be a significant number of components in this case? If we calculate 4 components we will find that they
explain 99.3% of the variation of the data set. The likelyhood of a fifth component adding interpretable information is thus
very low.
The clustering in the plot can be explained by the fact that the “exotic” ingredients are chemically quite different from the pure
flour based breads (KN2-KN4). Obviously, sesam seeds and chia-quinoa have a very different composition to ordinary rye flour.
Also, these three specimens have significant amounts of wheat flour in the dough compared to the others.
Looking back at Fig. 14, we can see that analyzing NIR spectral data gives similar clustering as in
Fig. 18 with some deviations. Main differences are found in the second component.
So what do we get from looking at a 3rd component in this case? This component explains another 11.1% of the variation in the
data set. Taking the 3rd component into consideration (Fig. 19) we see that three clusters can now be more clearly identified.
Additional value to interpretation is found in the loading line plot (Fig. 20) showing which wavelengths separate the two lower
groups. All the three loadings have significant peaks that will contribute to the clustering in the score plot. In this plot we also can see
which peaks influence the positive and negative score values. The main peak in the second loading (around 1210 nm) is related to
CeH bonds (which is interestingly not picked up first PC score). Here, also a peak at 1510 nm for proteins is observable and looking
at Figs. 14 and 19 we see that the same specimen appear in relation to the protein rich breads. Obviously, more data mining and
spectral interpretations are possible but are left out here.
The residual of a model is often ignored but it can be used in visualisation of the contribution of noise to a model. In Fig. 21, we
can see that after three components, very relevant peaks appear that influence the model. Pushing the model ahead by calculating
additional components will gradually show a diminishing spectral importance and an increased importance (or disturbance) of
noise.
Fig. 19 Score values (t2-t3) on average spectra from homogenised crisp bread of the 9 specimens.
Fig. 20 Loadings on average spectra from the 9 specimens homogenised crisp bread.
0.035
ROH
0.03
CH3
OH
0.025 Glucose
0.02 CH2
CH3
0.015
NH
0.01
0.005
0
1000 1100 1200 1300 1400 1500 1600 1700
Fig. 21 Standard deviation over nine objects. The fit for three components (blue) and the residual for three (orange), four (green) and five
(magenta) components. Some preliminary peak interpretations are shown.
2.03.6.5 Summing UpdPCA of Nine Bread Specimens

The composition/price data for the crispbread specimens after preprocessing may be reduced to a few components that show the
relationships between objects and variables in a plot uncovering how it all fits together. This is also a good occasion to introduce
some simple but valuable preprocessing techniques. The ANOVA table gives an indication of a useful rank. By using the NIR spectra
of the whole and ground bread samples, the same principle as for the composition data can be used except that the preprocessing
needs to be different, adapted to the spectral nature of the technique. The useful rank can be determined by ANOVA principles or by
plotting residual spectra. Plots can be used to show how the specimens relate to each other and how certain wavelengths are very
useful in the PCA model.
References
1. Stewart, G. On the Early History of Singular Value Decomposition. SIAM Rev. 1993, 35, 551–566.
2. Jolliffe, I. Principal Component Analysis, 2nd ed.; Springer: Berlin, 2002.
3. Wold, S.; Esbensen, K.; Geladi, P. Principal Component Analysis. Chemom. Intell. Lab. Syst. 1987, 2, 37–52.
4. Pearson, K. On Lines and Planes of Closest Fit to Systems of Points in Space. Philos. Mag. 1901, 6, 559–572.
5. Fischer, R.; MacKenzie, W. Studies in Crop Variation. II The Manurial Response of Different Potato Varieties. J. Agric. Sci. 1923, 13, 311–320.
6. Hotelling, H. Analysis of a Complex of Statistical Variables Into Principal Components. J. Educ. Psychol. 1993, 24, 417–441.
7. Albano, C.; Dunn, W.; Edlund, U.; Johansson, E.; Horden, B.; Sjöström, M.; Wold, S. Four Levels of Pattern Recognition. Anal. Chim. Acta 1978, 103, 429–443.
8. Frank, I.; Kowalski, B. Chemometrics. Anal. Chem. 1982, 54, 232R–243R.
9. Smilde, A.; Bro, R.; Geladi, P. Multi-Way Analysis, Applications in the Chemical Sciences, Wiley: Chichester, 2004.
10. Geladi, P.; Grahn, H. Multivariate Image Analysis, Wiley: Chichester, 1996.
11. Sasic, S., Ozaki, Y., Eds.; Raman, Infrared and Near-Infrared Chemical Imaging, Hoboken, NJ: Wiley, 2010.
12. Grahn, H., Geladi, P., Eds.; Techniques and Applications of Hyperspectral Image Analysis, Wiley: Chichester, 2007.
13. Park, B., Lu, R., Eds.; Hyperspectral Imaging Technology in Food and Agriculture, Springer: New York, NY, 2015.
14. Basantia, N., Nollet, L., Kamruzzaman, M., Eds.; Hyperspectral Imaging Analysis and Applications for Food Quality, CRC Press: Boca Raton, FL, 2018.
15. Chang, C. Hyperspectral Imaging. Techniques for Spectral Detection and Classification, Springer: New York, NY, 2003.
16. Lillesand, T.; Kiefer, R.; Chipman, J. Remote Sensing and Image Interpretation, 7th ed.; Wiley: Hoboken, NJ, 2015.
17. Campbell, J.; Wynne, R. Introduction to Remote Sensing, 5th ed.; Guilford Publications: New York, NY, 2011.
18. Croux, C.; Filzmoser, P.; Fritz, H. Robust Sparse Principal Component Analysis. Technometrics 2012, 55, 202–214.
19. Ammann, L. Robust Singular Value Decompositions: A New Approach to Projection Pursuit. J. Am. Stat. Assoc. 1993, 88, 505–514.
20. Strang, G. Introduction to Linear Algebra, 5th ed;, Wellesley Cambridge Press: Wellesley, MA, 2016.
21. Golub, G.; Van Loan, C. Matrix Computations, 4th ed.; Johns Hopkins University Press: Baltimore, MA, 2013.
22. Watkins, D. Fundamentals of Matrix Computations, Wiley-Blackwell: Hoboken, NJ, 2010.
23. Davis, J. Statistics and Data Analysis in Geology, Wiley: New York, NY, 1973.
24. Mardia, K.; Kent, J.; Bibby, J. Multivariate Analysis, Academic Press: London, 1979.
25. Johnson, R.; Wichern, D. Applied Multivariate Statistical Analysis, Prentice-Hall: Englewood Cliffs, NJ, 1982.
26. Brereton, R., Ed.; Multivariate Pattern Recognition in Chemometrics, Illustrated by Case Studies, Elsevier: Amsterdam, 1992.
27. McLennan, F., Kowalski, B., Eds.; Process Analytical Chemistry, Blackie Academic & Professional: London, 1995.
28. Henrion, R.; Henrion, G. Multivariate Datenanalyse, Springer: Berlin, 1995.
29. Gemperline, P., Ed.; Practical Guide to Chemometrics, 2nd ed.; Boca Raton, FL: CRC, 2006.
30. Jackson, J. A User’s Guide to Principal Components, Wiley: New York, NY, 1991.
31. Gray, V., Ed.; Principal Component Analysis: Methods, Applications, Technology, Nova Science Publishers: New York, NY, 2017.
32. Cross, R., Ed.; Principal Component Analysis Handbook, Clanrye International: New York, NY, 2015.
33. Lawton, W.; Sylvestre, E. Self Modeling Curve Resolution. Technometrics 1971, 13, 617–633.
34. Hamilton, J.; Gemperline, P. Mixture Analysis Using Factor Analysis. II: Self-Modeling Curve Resolution. J. Chemom. 1990, 4, 1–13.
35. Liang, Y.; Kvalheim, O. Resolution of Two-Way Data: Theoretical Background and Practical Problem-Solving. Part 1: Theoretical Background and Methodology. Fresenius’ J.
Anal. Chem. 2001, 370, 694–704.
36. Tauler, R.; Barcelo, D. Multivariate Curve Resolution and Calibration Applied to Liquid Chromatography-Diode Array Detection. TrAC, Trends Anal. Chem. 1993, 12, 319–327.
37. Jiang, J.; Ozaki, Y. Self-Modeling Curve Resolution (SMCR): Principles, Techniques and Applications. Appl. Spectrosc. Rev. 2002, 37, 321–345.
38. Comon, P. Independent Component Analysis, a New Concept? Signal Process. 1994, 36, 287–314.
39. Hyvärinen, A.; Oja, E. A Fast Fixed-Point Algorithm for Independent Component Analysis. Neural Comput. 1997, 9, 1483–1492.
40. Bingham, E.; Hyvärinen, A. A Fast Fixed-Point Algorithm for Independent Component Analysis of Complex Valued Signals. Int. J. Neural Syst. 2000, 10, 1–8.
41. Parastar, H.; Jalali-Heravi, M.; Tauler, R. Is Independent Component Analysis Appropriate for Multivariate Resolution in Analytical Chemistry. TrAC, Trends Anal. Chem. 2012,
31, 134–143.
42. Wold, S.; Sjostrom, M. SIMCA: A Method for Analyzing Chemical Data in Terms of Similarity and Analogy. In Chemometrics Theory and Application; Kowalski, B., Ed.; American
Chemical Society Symposium Series 52, American Chemical Society: Washington, DC, 1977; pp 243–282.
43. Beebe, K.; Pell, R.; Seasholtz, M. Chemometrics. A Practical Guide, Wiley: New York, NY, 1998.
44. Brereton, R. Chemometrics. Data Analysis for the Laboratory and Chemical Plant, Wiley: Chichester, 2003.
45. Naes, T.; Isaksson, T.; Fearn, T.; Davies, T. A User-Friendly Guide to Multivariate Calibration and Classification, NIR Publications: Chichester, 2002.

PCA 1 Geladi Comprehensive Chemometrics 2020

Uploaded by

Copyright:

Available Formats

PCA 1 Geladi Comprehensive Chemometrics 2020

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PCA 1 Geladi Comprehensive Chemometrics 2020

Uploaded by

Copyright:

Available Formats

2.

03 Principal Component Analysisq

2.03.1 Introduction With History 17

2.03.1 Introduction With History

Comprehensive Chemometrics, 2nd edition, Volume 2 https://doi.org/10.1016/B978-0-12-409547-2.14892-9 17

2.03.2 Data, Vectors, Matrices and Data Volumes

Temperatures, pressures, voltages, wavenumbers, mass numbers, etc

K Color, wavelength etc

Image index vert

Image index hor I

Fig. 2 Some three-way arrays that are often encountered.

2.03.3 Some Linear Algebra on Vectors and Matrices

V = +/-a* X +/-b* Y +/-c* Z

v = +/-a* x +/-b* y +/-c* z

2.03.4 The PCA Model, Singular Value Decomposition

The equation for SVD is:

2.03.4.2 PCA in Data Analysis. Useful Information and Residuals

p1T p2T p3T

2.03.4.3 Number of Components Used A (Rank)

Table 1 An ANOVA table of a PCA analysis.

Sum of squares Value of SS Degrees of freedom

2.03.4.4 Properties of Loading and Score Vectors

All score vectors are orthogonal:

The SSi are the ones in Table 1.

2.03.4.5 Summary and Literature

- a visualization tool in hyperspace.

2.03.5 A Comparison to Some Other Methods of Data Analysis

2.03.6 Diagnostics and Visualization for PCA

Table 2 Composition and names of the nine crisp bread samples.

KN1 Rye 1377 1.1 0.4 57.8 1.5 20 9 1.5 47

• not all variables are in the same units

2.03.6.3 PCA of the Composition Data

2.03.6.4 PCA of the NIR Spectra of the Nine Bread Samples

Table 3 Data set composed by nine objects nine variables.

PCA model PC no R2X R2X cum

Data preprocessing: Centering and UV-scaling.

t[1] (Normalized)/p[1] (60.5%)

Component size (% SS)

KN1 Rye rye (mash) close

2.03.6.5 Summing UpdPCA of Nine Bread Specimens

You might also like