PCA 1 Geladi Comprehensive Chemometrics 2020
PCA 1 Geladi Comprehensive Chemometrics 2020
PCA 1 Geladi Comprehensive Chemometrics 2020
q
Change History: February 2020. P Geladi updated the text, figures and references.
role of ANOVA and noise in PCA was mentioned, so this means that diagnostic tools for judging the meaningfulness of a PCA
model should be explained. Most of the diagnostics benefit from visualization in plots and this is introduced with a small example
(Section “Diagnostics and Visualization for PCA”). The success of a PCA analysis depends strongly on how the data is pretreated
and this depends on variable types where the data come from. Some of this is included in the examples in “Diagnostics and Visu-
alization for PCA” section.
2.03.1.2 History
The Singular Value Decomposition for rectangular matrices and its history are described by Stewart.1 More about the history of PCA
and SVD can be found in Jolliffe2 and a description of PCA for chemometrics was given in Ref. 3.
A simple form of SVD, called eigenvector/eigenvalue decomposition for square matrices was introduced independently by
Arthur Cayley in the UK and William Hamilton in Ireland during the middle 1850s. It is sometimes called the Cayley-Hamilton
decomposition. Later in that decade, Jacques Hadamard in France gave a Cayley-Hamilton decomposition equation for complex
number square matrices. Eugenio Beltrami of Italy and Camille Jordan of France are the originators of the decomposition method
SVD for rectangular matrices in 1873–74. Because calculations had to be made by hand, only small matrices were used in those early
days. One should also mention Karl Pearson’s work in 19014 on ANOVA decomposition of data matrices and an early paper by
Fischer and MacKenzie from 19235 that used SVD on real data from the natural sciences. This is probably the first chemomet-
rics/biometrics paper using multivariate data. An important article is by the American Harold Hotelling in 1933, introducing the
name “Principal Components.”6 Many of these old papers can be found as pdf files for free on the internet and many more historical
articles are mentioned in Refs. 2,3. The appearance of chemometrics happened during the 1970s and much of the early chemomet-
rics work used PCA.7,8
Showing vectors and matrices is best done graphically. Fig. 1 shows some definitions. One datum is just a number. Usually these
numbers are collected on many objects and then they form a column vector. Object is a generic name that can stand for samples,
trees, patients, fields etc. A row vector is data for one object. These data can be many things such as temperatures, pressures, densities,
viscosities, mass numbers, wavenumbers etc. The list can be made very long but the generic name is variable. If there are many vari-
ables measured on many objects, then a data matrix has to be formed by putting all row vectors under each other. Fig. 1 shows
a data matrix with I rows and J columns. Each datum in this matrix has coordinates (i,j) with i ¼ 1,.,I and j ¼ 1,.,J. It should
Column Row
Datum, scalar
vector vector
j=1,...,J
Variables J
Samples
Patients
Fields
Planets
Times
etc Data matrix
i=1,...,I
Index i,j
Objects
I
Fig. 1 Definition of a scalar, row and column vectors and the data matrix.
Principal Component Analysis 19
be clear that long vectors or large matrices are hard to study just as collections of numbers. The size of data matrices is different in the
different sciences. In chemometrics, analytical chemistry, instrumentation has largely replaced one-element determinations by titra-
tion, so that spectral data of many variables occurs in XRF, NIR, IR, Raman, NMR, mass spectrometry etc. At the same time, the
number of objects is not always so large, but in process industry with regular measurements over time over long periods also
many objects can be collected.
From the definition of a data matrix it is easy to define even larger structures and some examples are given in Fig. 2. One array is
a data matrix with objects and variables measured over time, giving three ways. These three-way arrays have their own methods of
data analysis.9 The other array is a multivariate or hyperspectral image where each point in the image is made up of a vector of
wavelengths, wavelength bands etc. These arrays are produced very frequently in satellite and airborne imaging but there exists
also laboratory equipment for making such images.10–14 The literature about airborne/satellite imaging is huge and only a few
examples are given here.15–17 For multivariate or hyperspectral images PCA can be useful after rearranging the data.
An important aspect of data analysis to be taken up here is missing data and bad data. Missing data can always occur in any data
matrix. A small amount 1–2% of missing data that is randomly spread is acceptable. Large amounts of missing data or a systematic
spread of the missing data is no good. See Fig. 3. There are methods of imputation, guessing the value that fits in the hole and this
makes the matrix complete. It is also possible to use algorithms that ignore the holes in the calculation. Some references are 18,19.
Another thing to remember is that some manufacturers of instrumentation automatically fill gaps after a measurement which makes
the data matrix look free of gaps.
Variable J
Hyperspectral Image
Fig. 3 Left: A data matrix may have a small amount of randomly spread holes and this is easily remedied. Right: Many holes or too systematic
location of holes is bad, because the remedies will not work properly and give misleading information.
20 Principal Component Analysis
The same as above is valid for outliers, i.e., data points. When some types of outliers are removed they will make holes. PCA is
a very good technique for having a quick look at a larg(er) data matrix and getting an overview. This overview allows finding where
the erroneous or bad data are and once this is remedied a new PCA allows one to see general trends in the data. More about this
comes in later sections.
Data matrices may have extreme shapes. One example is an industrial biochemical process that takes 2 weeks and where temper-
ature, pressure, turbidity and viscosity are measured every 10 min in order to follow the reaction’s progress. This would create a data
matrix of size 2016 4. Another example is an experimental design of 11 runs where each run gives an IR spectrum of 1500 wave-
lengths. That would be a data matrix of 11 1500. In the first case variables were expensive to come by but frequent sampling was
no problem. In the second case it was easy to collect many variables but the samples were expensive to make. In “omics” sciences,
very large data matrices (many objects, many variables) may be obtained.
An important observation is that of the nature of the variables. In the IR spectrum example above, all variables are measured in
the same units and their order is important. Putting the wavelengths in random order would be nonsense for interpreting the results.
In a spectrum, wavelengths with low absorption values are less interesting than ones with high absorption values, so no rescaling is
needed. For the industrial example, the variables are all measured with different equipment and in different units and their order is
not important. Temperatures in degrees C and pressures in Pascal also have quite different numerical values and this means that the
ones with higher numerical values would dominate the analysis and ones with lower values would automatically end up in the
noise. Therefore, these data are usually scaled by their inverse standard deviation.
Some data have mixed variables. Part of the data are all in the same units (spectra, chromatograms) and in a certain order while
others are in different units and order does not matter (temperatures, pressures, viscosities). All these situations require some
thinking and clever scaling and some of this is explained in “Diagnostics and Visualization for PCA” section.
PCA and SVD cannot be understood without a minimum of linear algebra and there are good books on the subject.20–22 There are
strict mathematical definitions of the operations of linear algebra but here a simplified graphical definition will suffice. Fig. 4 shows
some important operations. For addition or subtraction, all matrices should be of the same size. Addition or subtraction is done
elementwise. The figure also shows the multiplication by a scalar. This means that every element in the matrix is multiplied by
the scalar. Per element the operation is:
vij ¼ a x ij þ b yij þ c zij do this for all i and all j (1)
J J J J
I I I I
1 1 1 1
Fig. 4 Upper: Addition/subtraction and scalar multiplication of matrices of equal size. V, X, Y, Z are matrices, a, b and c are scalars. Lower: The
same operation for the column vectors v, x, y, z.
Principal Component Analysis 21
Fig. 4 also shows some nomenclature. A scalar (a, b, c) is presented by a lowercase character, a matrix (V, X, Y, Z) is an uppercase
boldface character. Vectors (v, x, y, z) are just matrices where the number of rows or columns is one. They are given by lowercase
boldface characters.
If one defines per definition that all vectors are column vectors, the transposition is needed to make a row vector from a column
vector. The superscript T is used for showing this: x becomes xT, x transposed. Vectors and matrices can be multiplied by the dot
product. Fig. 5 left shows this. A row vector xT multiplied by a column vector y becomes a scalar a: xT.y ¼ a. If the two vectors
are identical, then this scalar is the Sum of Squares (SS) of the vector: xT.x ¼ a ¼ SS(x). In Fig. 5 right is another product of two
vectors giving a matrix: y.xT ¼ W. This is the outer product. Fig. 6 gives something similar, but now for matrices.
With the matrix/vector operations in Figs. 4 and 5 the PCA equation can be constructed in the next section.
J 1 1
xT • = a
1
1
y
J J
1
• xT
1
J
y = W
K
K
Fig. 5 Multiplication of vectors. Left: The inner product of two vectors of the same size. Right: The outer product of two vectors of different size.
I
J J
PT =
T
X
K K
Fig. 6 An outer product of two matrices. Each point xkj in the matrix X is an inner product over I elements: xkj ¼ sum over all i (tki pij ). The multi-
plication symbol can be left out if it is clear what operation is meant.
22 Principal Component Analysis
A
K K K
T
P
X = T + E
I I I
Fig. 7 The PCA equation. If A is large enough (here A ¼ K), E becomes a matrix of zeroes and can be left out.
Principal Component Analysis 23
K A A K K
T
S V
A A
X = U + E
I I I
Fig. 8 The SVD equation. If A is large enough (here A ¼ K), E becomes a matrix of zeroes and can be left out. S is a diagonal matrix with of
singular values on the diagonal. This is the same as Fig. 7 by setting T ¼ US.
Xmc = t1 + t2 + t3 + E
Fig. 9 The PCA equation for three components. Notes: Multiplication symbols are left out, vectors are represented by lines, not rectangles as
before, sizes are not given.
Var2
a2
0
a1
0 Var1
Fig. 10 In a scatter plot of two mean-centered variables, a principal component is the line that maximizes the sum of squares of the projections on
the line, the scores. The distances to the line (dashed arrows) are minimized in the sum of squares sense. The cosines of the angles a1 and a2 are
the elements of the first loading vector p1.
The algorithms used to calculate principal component models are of two kinds: those that calculate all scores and loadings at
once using an SVD like calculation and those that calculate the components one by one and stop when the desired number of
components is reached. One algorithm used for this is called NIPALS (Nonlinear Iterative PArtial Least Squares). Extended literature
examples on these matters are found in the reference list.23–32
For large data sets where only a few components are needed, the one by one calculation may be the only doable
alternative.
24 Principal Component Analysis
In the sum of squares sense, the sum of squares (SS) of X,SSX, is the sum of the SS of all principal components (SS1, SS2 and SS3
in the equation below for a model with three principal components) and the SS of the residual, SSE. The numerical example makes
this clear. The first component explains as much as possible of the SS of X. The second component explains the maximum SS when
SS1 is removed, etc.:
SSx ¼ SS1 þ SS2 þ SS3 þ SSE (SSeq, here for only three components, but can be generalized to any number).
100 ¼ 62 þ 25 þ 9 þ 4.
100% ¼ 62% þ 25% þ 9% þ 4%.
The sum of squares (SS) of X is the sum of the SS of all components and the SS of the residual. The numerical example makes this
clear. The first component explains as much as possible of the SS of X. The second component explains the maximum SS when SS1 is
removed etc.
Sum of squares values may be shown in an ANOVA table. Such an ANOVA table could be used for testing each component
against the residual in an F-test. More about choosing A is given in the next subsection and in the diagnostics section.
SS1 62 1
SS2 25 1
SS3 9 1
SSE 4 I-4
SSx 100 I-1
Principal Component Analysis 25
R2 or Q2
100% R2
Q2
50%
Comp nr
1 2 3 4 5 6 7 8
Fig. 11 R2 and Q2 as a function of the number of components. R2 (circles) always increases as a function of the number of components. The
cross validation based Q2 (triangles) decreases or stagnates when the systematic information is used up and the components become noisy.
related eigenvalues, small and similar among them. Graphically, the point in which noise-related eigenvalues start appearing can be
detected with the ‘broken stick’ approach (see Fig. 15).
Most software programmers have their own favorites, so not all software allows all plots. In Fig. 11, Q2 goes down after four
components and R2 only increases slowly from 4, so 4 components, a rank of 4 would be good enough. Diagnostics like R2
and Q2 are based only on variation explained. They have to be supplemented with a study of the obtained scores and loadings
and a judgment whether these make sense.
or pi T pj ¼ 0 if NOT i ¼ j (8)
or t i T t j ¼ 0 if NOT i ¼ j (10)
There are some 10–20 named decompositions in the literature with confusing names and many of them are very similar to PCA or
can be made identical to PCA. Others have unique properties different from those of PCA. Only Multivariate curve resolution
(MCR) and Independent Component Analysis (ICA) are mentioned here. There is also a short mention of the Soft Independent
Modeling of Class Analogies (SIMCA) method as a popular application of PCA.
An important decomposition is “Multivariate Curve Resolution” or “Non-Negative factor Analysis” or “Self Modeling Curve
Resolution” and there are many such names. Outside chemistry there are even more different names for the same decomposition.
It is good to be aware of that.
Multivariate Curve Resolution (MCR) can be defined as a bilinear decomposition method that uses constraints related to chem-
ically meaningful properties of the profiles sought. In chemistry it may refer to situations where data cannot be negative. There exist
no negative concentrations, spectra or chromatograms. This means that principal component loadings and scores, that by definition
have negative parts, would give unrealistic views of the data. The MCR equation for three components is:
X ¼ c1 s1 T þ c2 s2 T þ c3 s3 T þ E (11)
where c1, c2 and c3 are vectors of concentrations of three chemicals and s1, s2 and s3 are their spectra for concentration one or
normalized spectra. The vectors in Eq. (11) have no negative elements. E is a residual describing measurement noise. The spectra of
the mixtures are in X and only mixtures are available. The goal of the decomposition is to find the pure spectra and the corre-
sponding concentrations. The oldest reference is Lawton and Sylvestre from 1971.33
The concentrations and spectral values cannot become negative, so even though the equation above resembles that of PCA in
“The PCA Model, Singular Value Decomposition” section, the solution, the decomposition of X, must be a different one. It is also
not possible to mean-center, because this would introduce negative values artificially.
One of the properties of MCR that is different from PCA is that all components change if a component is added or removed.
Finding out how many components are needed is therefore very important. Sometimes PCA is used for a preliminary analysis
for finding how many components would be good to use in a factor analysis model. Some good explanations can be found on
the internet. Some good review articles are Refs. 34–37.
Independent component analysis38 is based on the source separation problem. This is also referred to as the “cocktail party
problem” meaning to separate the voice of each speaker from the sound mixture that is observed. The equation for three compo-
nents is:
X ¼ a1 s1 T þ a2 s2 T þ a3 s3 T þ E (12)
Here [s1 s2 s3] are the independent signal profiles and [a1 a2 a3] are the mixing proportions. The same equation as for MCR and PCA
is back. The only difference is on how the vectors are calculated. ICA does this by imposing statistical independence and non-
normality of the vectors sa. There is not an ANOVA decomposition (as in PCA) in ICA and no orthogonal scores-loadings are
calculated.
ICA can just like PCA be used as a first decomposition to guess the rank in MCR. According to some authors, ICA is closer to MCR
than to PCA.39,40 A good comparison of PCA, ICA and MCR is given in Ref. 41.
Except in very special (synthetic) cases PCA (SVD) always converges to the same solution except for sign flipping. In MCR and
ICA this is not always the case and the role and interpretation of the residual are not so straightforward.
PCA is often used in classification. The SIMCA method42 should be mentioned. Basically, after definition of two or more classes
of objects, local PCA models are made for each class. These models may be of a different rank. The classification then exists of taking
new objects and determining to which local model they fit the closest. This way PCA is also used in popular classification methods
described in many chemometrics books.26,28,29,43,44
Number Basic flour content Energy KJ Usfat g Satfat g CarbH g Sugar g Fiber g Protein g Salt g Price/Kg units
All values are given per 100 g of bread. The variable names are abbreviations: Usfatdunsaturated fat, Satfatdsaturated fat and CarbHdcarbohydrates without sugar.
From this point on a simple example is introduced. Nine packages of Swedish crispbread were purchased in a supermarket. The
names and nutritional composition found on the packages are given in Table 2.
A number of interesting data analytical observations can be made:
2.03.6.2 Preprocessing
Before conducting any PCA, a thorough consideration on the nature of the data to be modeled is needed. A useful PCA model
depends on how the data are preprocessed. Many instruments that generate multivariate data have their own idiosyncrasies that
require pretreatment of the data and these pretreatments are not absolute. A rule of thumb is that “Less is more” and the less pre-
processing is done, the lower will be the risk of introducing bias. On the other hand, lack of proper preprocessing will lead to non-
optimal analysis of data and loss of potential information.
One of the nice properties of PCA is that there are no non-negativity demands so that many preprocessing methods are available.
It is impossible to describe all preprocessing methods available in commercial software, but a few need to be mentioned. As
shown in Eqs. (5) and (6) it is always possible to subtract the variable-wise mean. This is almost always done. The result is that
the first component does not describe the mean. Sometimes median subtraction is used.
A second important decision is whether or not to scale the variables. Some reasons to do this or not.
are:
1. all variables are in different units. In this case, rescaling is needed because just changing a unit could make a variable extremely
important or not at all important. Very often the variables values are divided by the standard deviation. This is called Unit
Variance (UV) scaling. This is the case for the variables in “PCA of the Composition Data” section, Fig. 13 below.
2. all variables are in the same units. For a lot of spectral data, small values mean noisy data and large values mean important data.
Then rescaling to bring all variables on the same footing is not a good thing to do because it blows up the noise. For the spectral
data in “PCA of the NIR Spectra of the Nine Bread Samples” section (Fig. 17) this is the case.
3. sometimes data come in blocks with different units per block. In that case, proportional scaling of all blocks (block scaling) makes
sense. In that way all blocks have the same chance of influencing the model. It may also be necessary to scale inside the blocks first.
The three situations above are linear scaling. Each variable is multiplied by a constant. Sometimes nonlinear scaling is needed.
Taking logarithms or powers (e.g., ½ ¼ square root) may be needed for some data to make their distribution less skewed. No matter
what scaling is used, removing the mean afterwards is always wise.
All scaling mentioned up to now was done variable-wise, but scaling may also be needed object-wise. This is especially true for
spectral data where the position of a variable between its neighbors is important. In “PCA of the NIR Spectra of the Nine Bread
Samples” section, there is an example presented of why object-wise scaling of spectral data is needed and how it is carried out.
1. For spectral data the physical nature of the data has to be considered, especially for solids where penetration depth is an
important factor. Absorption, transmission, reflection, emission, fluorescence intensity etc. may have to be transformed into
each other to give useful results.
2. Besides that, object-wise centring and scaling may become useful. There may also be a need for smoothing or calculating pseudo-
derivatives of the spectra.
28 Principal Component Analysis
Fig. 12 (A) Front side of selected specimens (top left KN1; bottom right KN9). (B) Backside of selected specimens.
Raw data Mean Centered data Mean Centered and UV-scaled data
2000 200 3
1600 2
100
1
1200
0 0
800
-1
400 -100
-2
0 -200 -3
Fig. 13 Raw data, after centering, after centering and UV scaling respectively.
Principal Component Analysis 29
3. Sometimes a complete transformation into Fourier or Wavelet coefficients gives the desired results.
Some definitions are found in the chemometrics books.28,29,43–45
1 0.60 0.60
2 0.17 0.77
3 0.15 0.92
4 0.06 0.98
(A)
Stars represent objects Circles represent variables
u2 0,0 p2 0,0
u1 p1
u2/p2 0,0
(B)
u1/p1
0.8
CH sugar g
0.7
0.6
0.5
t[2] (Normalized)/p[2] (17.3%)
0.4
0.3
Price/Kg Fiber g LAB ID
0.2 Energy KJ KN1
0.1 Fat Sat g
KN2
Fat US g KN3
0
KN4
–0.1 KN5
Prot g
CH g KN6
–0.2
KN7
–0.3 KN8
KN9
–0.4
–0.5 NaCl g
–0.6
–0.45 –0.4 –0.35 –0.3 –0.25 –0.2 –0.15 –0.1 –0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
12.5
7.5
2.5
1 2 3 4 5 6 7 8 9
Component number (a)
Fig. 15 Broken stick (scree plot) method for selecting the appropriate number of components. The red line shows SS per component and the
dashed line shows the broken stick principle.
The number of relevant components to extract is diagnosed by using the “broken stick” approach (Fig. 15) (see also “Number of
Components Used A (Rank)” section). Here we can observe a gradient of diminishing influence of the score values in the PC
model, presented as Eigenvalues (score variance). The stick is “broken” at the third or fourth eigenvalue meaning that higher compo-
nents are of less importance and contain too much noise. This is one very simple but often effective way of understanding the
number relevant components.
As we are now dealing with NIR spectra there are chemical and sampling issues to be observed. So, any clustering in the data will
be related to this.
In Fig. 16, the first and second PC from this analysis are plotted. Here, three “clusters” can be identified. One cluster of seven
objects are found in the lower left corner of the figure. These specimens are all “wheat contaminated” rye based products. In the
lower right corner, there is a cluster of “pure” rye products. In the strong positive end of the 2nd component, the exotic sprinkled
side of some of the specimens seem to have a very strong influence.
Fig. 16 Score plot presenting the 9 raw bread specimens (back- and front side, 18 objects) and with information added on additives (sprinkles). In
this case, the average values on the five NIR spectral measurements on each bread side have only been preprocessed by mean centering.
32 Principal Component Analysis
Table 4 Additional characteristics to crisp bread specimen and distance in PC model between front and back.
LAB ID Base cerealia Flakes Seeds sprinkles Distance between back to front
However, the front and back are in some cases very similar and in other very different, all depending on the treatment and sprin-
kles added to the bread. In Table 4, visually determined distance between the same specimen front and back is shown. In this case,
the NIR spectra picks up on chemical differences not visible to the naked eye. Looking back at the pictures in Fig. 12, these differ-
ences are clearly not apparent.
As seen from the section and cases above, it is complicated to acquire representative spectra from heterogenous materials as the
specimens. Because of this, the specimens were ground with a mortar and pestle and the powder produced was reanalyzed with the
NIR-Probe. This grinding and homogenization of the samples will affect the collection of data in several ways because homogenous
material, seed oil and core materials will be included in the matrix and optical surface reflections will be reduced.
As with the examples in previous section we need to preprocess the data in order to improve the information exchange and reduce
optical scattering etc. What we need to do is something that enables us to retain the chemical information of these spectra. In Fig. 17, the
effects of pretreatment procedure can be overviewed graphically. In this figure, (a) and (b) reflects the physical optical effects that masks
the chemical information. Here we can follow the different stages of preprocessing where in (a) represents the raw NIR spectra, (b) off-set
(A)
0.8
0.6
0.4
0.2
–0.2
0 20 40 60 80 100 120 140
Fig. 17 NIR spectra (A) Raw, (B) off-set corrected, (C) slope corrected. Horizontal axis is variable number (wavelength) and vertical axis is the
absorbance values.
Principal Component Analysis 33
(B)
0.3
0.2
0.1
–0.1
–0.2
(C) 0 20 40 60 80 100 120 140
2
1.5
0.5
–0.05
–1
–1.5
0 20 40 60 80 100 120 140
Fig. 17 (Continued).
34 Principal Component Analysis
Fig. 18 Score values (t1-t2) on average spectra of the 9 specimens, this time on homogenised samples of crisp bread. Here we have used SNV and
mean centering to reduce scattering effects.
corrected eliminate effects of surface smoothness and -reflections, (c) slope corrected to minimise influence of particle size. Thus this
procedure from (a) to (c) represents SNV correction. This preprocessing of the spectra retains mainly chemical information in (c).
If a new PC model is calculated based on this new data set (109 wavelengths and 9 objects, mean centering and SNV processed),
the PC1 and PC2 plot can be seen in Fig. 18.
So what would be a significant number of components in this case? If we calculate 4 components we will find that they
explain 99.3% of the variation of the data set. The likelyhood of a fifth component adding interpretable information is thus
very low.
The clustering in the plot can be explained by the fact that the “exotic” ingredients are chemically quite different from the pure
flour based breads (KN2-KN4). Obviously, sesam seeds and chia-quinoa have a very different composition to ordinary rye flour.
Also, these three specimens have significant amounts of wheat flour in the dough compared to the others.
Looking back at Fig. 14, we can see that analyzing NIR spectral data gives similar clustering as in
Fig. 18 with some deviations. Main differences are found in the second component.
So what do we get from looking at a 3rd component in this case? This component explains another 11.1% of the variation in the
data set. Taking the 3rd component into consideration (Fig. 19) we see that three clusters can now be more clearly identified.
Additional value to interpretation is found in the loading line plot (Fig. 20) showing which wavelengths separate the two lower
groups. All the three loadings have significant peaks that will contribute to the clustering in the score plot. In this plot we also can see
which peaks influence the positive and negative score values. The main peak in the second loading (around 1210 nm) is related to
CeH bonds (which is interestingly not picked up first PC score). Here, also a peak at 1510 nm for proteins is observable and looking
at Figs. 14 and 19 we see that the same specimen appear in relation to the protein rich breads. Obviously, more data mining and
spectral interpretations are possible but are left out here.
The residual of a model is often ignored but it can be used in visualisation of the contribution of noise to a model. In Fig. 21, we
can see that after three components, very relevant peaks appear that influence the model. Pushing the model ahead by calculating
additional components will gradually show a diminishing spectral importance and an increased importance (or disturbance) of
noise.
Principal Component Analysis 35
Fig. 19 Score values (t2-t3) on average spectra from homogenised crisp bread of the 9 specimens.
Fig. 20 Loadings on average spectra from the 9 specimens homogenised crisp bread.
36 Principal Component Analysis
0.035
ROH
0.03
CH3
OH
0.025 Glucose
0.02 CH2
CH3
0.015
NH
0.01
0.005
0
1000 1100 1200 1300 1400 1500 1600 1700
Fig. 21 Standard deviation over nine objects. The fit for three components (blue) and the residual for three (orange), four (green) and five
(magenta) components. Some preliminary peak interpretations are shown.
References
1. Stewart, G. On the Early History of Singular Value Decomposition. SIAM Rev. 1993, 35, 551–566.
2. Jolliffe, I. Principal Component Analysis, 2nd ed.; Springer: Berlin, 2002.
3. Wold, S.; Esbensen, K.; Geladi, P. Principal Component Analysis. Chemom. Intell. Lab. Syst. 1987, 2, 37–52.
4. Pearson, K. On Lines and Planes of Closest Fit to Systems of Points in Space. Philos. Mag. 1901, 6, 559–572.
5. Fischer, R.; MacKenzie, W. Studies in Crop Variation. II The Manurial Response of Different Potato Varieties. J. Agric. Sci. 1923, 13, 311–320.
6. Hotelling, H. Analysis of a Complex of Statistical Variables Into Principal Components. J. Educ. Psychol. 1993, 24, 417–441.
7. Albano, C.; Dunn, W.; Edlund, U.; Johansson, E.; Horden, B.; Sjöström, M.; Wold, S. Four Levels of Pattern Recognition. Anal. Chim. Acta 1978, 103, 429–443.
8. Frank, I.; Kowalski, B. Chemometrics. Anal. Chem. 1982, 54, 232R–243R.
9. Smilde, A.; Bro, R.; Geladi, P. Multi-Way Analysis, Applications in the Chemical Sciences, Wiley: Chichester, 2004.
10. Geladi, P.; Grahn, H. Multivariate Image Analysis, Wiley: Chichester, 1996.
11. Sasic, S., Ozaki, Y., Eds.; Raman, Infrared and Near-Infrared Chemical Imaging, Hoboken, NJ: Wiley, 2010.
12. Grahn, H., Geladi, P., Eds.; Techniques and Applications of Hyperspectral Image Analysis, Wiley: Chichester, 2007.
13. Park, B., Lu, R., Eds.; Hyperspectral Imaging Technology in Food and Agriculture, Springer: New York, NY, 2015.
14. Basantia, N., Nollet, L., Kamruzzaman, M., Eds.; Hyperspectral Imaging Analysis and Applications for Food Quality, CRC Press: Boca Raton, FL, 2018.
15. Chang, C. Hyperspectral Imaging. Techniques for Spectral Detection and Classification, Springer: New York, NY, 2003.
16. Lillesand, T.; Kiefer, R.; Chipman, J. Remote Sensing and Image Interpretation, 7th ed.; Wiley: Hoboken, NJ, 2015.
17. Campbell, J.; Wynne, R. Introduction to Remote Sensing, 5th ed.; Guilford Publications: New York, NY, 2011.
18. Croux, C.; Filzmoser, P.; Fritz, H. Robust Sparse Principal Component Analysis. Technometrics 2012, 55, 202–214.
19. Ammann, L. Robust Singular Value Decompositions: A New Approach to Projection Pursuit. J. Am. Stat. Assoc. 1993, 88, 505–514.
20. Strang, G. Introduction to Linear Algebra, 5th ed;, Wellesley Cambridge Press: Wellesley, MA, 2016.
21. Golub, G.; Van Loan, C. Matrix Computations, 4th ed.; Johns Hopkins University Press: Baltimore, MA, 2013.
22. Watkins, D. Fundamentals of Matrix Computations, Wiley-Blackwell: Hoboken, NJ, 2010.
23. Davis, J. Statistics and Data Analysis in Geology, Wiley: New York, NY, 1973.
24. Mardia, K.; Kent, J.; Bibby, J. Multivariate Analysis, Academic Press: London, 1979.
25. Johnson, R.; Wichern, D. Applied Multivariate Statistical Analysis, Prentice-Hall: Englewood Cliffs, NJ, 1982.
26. Brereton, R., Ed.; Multivariate Pattern Recognition in Chemometrics, Illustrated by Case Studies, Elsevier: Amsterdam, 1992.
27. McLennan, F., Kowalski, B., Eds.; Process Analytical Chemistry, Blackie Academic & Professional: London, 1995.
28. Henrion, R.; Henrion, G. Multivariate Datenanalyse, Springer: Berlin, 1995.
29. Gemperline, P., Ed.; Practical Guide to Chemometrics, 2nd ed.; Boca Raton, FL: CRC, 2006.
30. Jackson, J. A User’s Guide to Principal Components, Wiley: New York, NY, 1991.
Principal Component Analysis 37
31. Gray, V., Ed.; Principal Component Analysis: Methods, Applications, Technology, Nova Science Publishers: New York, NY, 2017.
32. Cross, R., Ed.; Principal Component Analysis Handbook, Clanrye International: New York, NY, 2015.
33. Lawton, W.; Sylvestre, E. Self Modeling Curve Resolution. Technometrics 1971, 13, 617–633.
34. Hamilton, J.; Gemperline, P. Mixture Analysis Using Factor Analysis. II: Self-Modeling Curve Resolution. J. Chemom. 1990, 4, 1–13.
35. Liang, Y.; Kvalheim, O. Resolution of Two-Way Data: Theoretical Background and Practical Problem-Solving. Part 1: Theoretical Background and Methodology. Fresenius’ J.
Anal. Chem. 2001, 370, 694–704.
36. Tauler, R.; Barcelo, D. Multivariate Curve Resolution and Calibration Applied to Liquid Chromatography-Diode Array Detection. TrAC, Trends Anal. Chem. 1993, 12, 319–327.
37. Jiang, J.; Ozaki, Y. Self-Modeling Curve Resolution (SMCR): Principles, Techniques and Applications. Appl. Spectrosc. Rev. 2002, 37, 321–345.
38. Comon, P. Independent Component Analysis, a New Concept? Signal Process. 1994, 36, 287–314.
39. Hyvärinen, A.; Oja, E. A Fast Fixed-Point Algorithm for Independent Component Analysis. Neural Comput. 1997, 9, 1483–1492.
40. Bingham, E.; Hyvärinen, A. A Fast Fixed-Point Algorithm for Independent Component Analysis of Complex Valued Signals. Int. J. Neural Syst. 2000, 10, 1–8.
41. Parastar, H.; Jalali-Heravi, M.; Tauler, R. Is Independent Component Analysis Appropriate for Multivariate Resolution in Analytical Chemistry. TrAC, Trends Anal. Chem. 2012,
31, 134–143.
42. Wold, S.; Sjostrom, M. SIMCA: A Method for Analyzing Chemical Data in Terms of Similarity and Analogy. In Chemometrics Theory and Application; Kowalski, B., Ed.; American
Chemical Society Symposium Series 52, American Chemical Society: Washington, DC, 1977; pp 243–282.
43. Beebe, K.; Pell, R.; Seasholtz, M. Chemometrics. A Practical Guide, Wiley: New York, NY, 1998.
44. Brereton, R. Chemometrics. Data Analysis for the Laboratory and Chemical Plant, Wiley: Chichester, 2003.
45. Naes, T.; Isaksson, T.; Fearn, T.; Davies, T. A User-Friendly Guide to Multivariate Calibration and Classification, NIR Publications: Chichester, 2002.