Chemometrics in Metabonomics

Chemometrics in Metabonomics
Johan Trygg,† Elaine Holmes,‡ and Torbjo1 rn Lundstedt*,§,|
Research group for Chemometrics, Institute of Chemistry, Umeå University, Sweden, Biological Chemistry,
Biomedicinal Sciences Division, Faculty of Medicine, Imperial College London, Sir Alexander Fleming Building,
London SW7 2AZ, United Kingdom, Department of Pharmaceutical Chemistry, Uppsala University, Sweden,
and AcurePharma, Uppsala, Sweden
Received November 11, 2006
We provide an overview of how the underlying philosophy of chemometrics is integrated throughout

metabonomic studies. Four steps are demonstrated: (1) definition of the aim, (2) selection of objects,
(3) sample preparation and characterization, and (4) evaluation of the collected data. This includes the
tools applied for linear modeling, for example, Statistical Experimental Design (SED), Principal
Component Analysis (PCA), Partial least-squares (PLS), Orthogonal-PLS (OPLS), and dynamic extensions
thereof. This is illustrated by examples from the literature.
Keywords: Statistical Experimental Design (SED) • PCA • OPLS • Class-specific studies • Dynamic studies •
Multivariate Design
Introduction shift). This development generates huge and complex data

tables, which are hard to summarize and overview without
Our intention with this review is to give an overview of
appropriate tools. However, in biology, chemometric method-
papers applying a chemometrical approach throughout a
ology has been largely overlooked in favor of traditional
metabonomic study. We will provide the readers with a number
statistics. It is not until recently that the overwhelming size and
of examples on different types of problems and methods
complexity of the ‘omics’ technologies has driven biology
applied. In all of the cases, we will only briefly discuss two or
toward the adoption of chemometric methods. That includes
three examples and then provide the readers with references
efficient and robust methods for modeling and analysis of
to a number of papers dealing with similar examples in an
complicated chemical/biological data tables that produce
analogue way. The philosophy presented in this review strongly
interpretable and reliable models capable of handling incom-
supports the statement made by Robertson1 in his metabo-
plete, noisy, and collinear data structures. These methods
nomics review “However, the implementation and interpreta-
include principal component analysis2 (PCA) and partial least-
tion of the technology and data it generates is not something
squares3,4 (PLS). It is also important to stress that chemometrics
that should be trivialised. Proper expertise in biological sci-
also provides a means of collecting relevant information
ences, analytical sciences (nuclear magnetic resonance and/
through statistical experimental design5-7 (SED).
or mass spectrometry) and chemometrics should all be con-
The underlying philosophy of chemometrics, in combination
sidered necessary prerequisites. If these factors are properly
with the chemometrical toolbox, can efficiently be applied
considered, the technology can add significant value as a tool
throughout a metabonomic study. The philosophy is needed
for preclinical toxicologists.” We just would like to make a
already from the start of a study through the whole process to
minor change to the last sentence of the citation; “...If these
the biological interpretation.
factors are properly considered, the technology can add
The Study Design: Make Data Contain Information. The
significant value as a tool for disease diagnosis, pharmacody-
metabonomics approach is more demanding on the quality,
namic studies, preclinical toxicologists, and more to come.”
accuracy, and richness of information in data sets. Statistical
In metabonomics, as well as in other branches of science
Experimental Design (SED)5,6 is recommended to be used
and technology, there is a steady trend toward the use of more
through the whole process, from defining the aim of the study
variables (properties) to characterize observations (e.g., samples,
to the final extraction of information.
experiments, time points). Often, these measurements can be
arranged into a data table, where each row constitutes an The objective of experimental design is to plan and conduct
observation and the columns represent the variables or factors experiments in order to extract the maximum amount of
we have measured (e.g., wavelength, mass number, chemical information in the fewest number of experimental runs. The
basic idea is to devise a small set of experiments, in which all
* To whom correspondence should be addressed. E-mail: pertinent factors are varied systematically. This set usually does
[email protected]. not include more than 10-20 experiments. By adding ad-
†
Umeå University. ditional experiments, one can investigate factors more thor-
‡
Imperial College London.
§
Uppsala University. oughly, for example, the time dependence from two to five time
|
AcurePharma. points. In addition, the noise level is decreased by means of
10.1021/pr060594q CCC: $37.00  2007 American Chemical Society Journal of Proteome Research 2007, 6, 469-479 469
Published on Web 12/22/2006
reviews Trygg et al.
averaging, the functional space is efficiently mapped, and sents a multivariate profile (with K-descriptors) for each object
interactions and synergisms are seen. Antti et al.8 have applied that is a fingerprint of its inherent properties.
a statistical experimental design to investigate the effect of the Geometrically, the multivariate profile represents one point
dose of hydrazine and time on liver toxicity. The result from in K-dimensional space, whose position (coordinates) in this
the NMR and clinical chemistry was evaluated by PLS. The PLS space is given by the values in each descriptor. For multiple
analysis could also reveal the correlation pattern between the profiles, it is possible to construct a two-dimensional data table,
different blocks as well as within blocks according to dose, time, an X matrix, by stacking each multivariate profile on top of
and the interaction between time and dose. each other. The N rows then produce a swarm of points in
Extract Information from Data. In metabonomic studies, K-dimensional space,
the observations and samples are often characterized using 2.1. Projection-Based Methods. The main, underlying as-
modern instrumentation such as GC-MS, LC-MS, and LC- sumption of projection-based methods is that the system or
NMR spectroscopy. The analytical platform is important and process under consideration is driven by a small number of
largely determined by the biological system and the scientific latent variables (LVs).20 Thus, projection-based methods can
question. be regarded as a data analysis toolbox, for indirect observation
Multivariate analyses based on projection methods represent of these LVs. This class of models is conceptually very different
a number of efficient and useful methods for the analysis and from traditional regression models with independent predictor
modeling of these complex data. Principal Component Analy- variables. They are able to handle many, incomplete, and
sis2 (PCA) is the workhorse in chemometrics. PCA makes it is correlated predictor variables in a simple and straightforward
possible to extract and display the systematic variation in the way, hence their wide use.
data. A PCA model provides a summary, or overview, of all Projection methods convert the multidimensional data table
observations or samples in the data table. In addition, group- into a low-dimensional model plane that approximates all rows
ings, trends, and outliers can also be found. Hence, projection- (e.g., objects or observations) in X, that is, the swarm of points.
based methods represent a solid basis for metabonomic The first PCA model component (t1p1T) describes the largest
analysis. Canonical correlation,9 correspondence analysis,10 variation in the swarm of points. The second component
neural networks,11-12 Bayesian modeling,13 and hidden Markov models the second largest variation and so on. All PCA
models14 represent additional modeling methods but are components are mutually linearly orthogonal to each other see
outside the scope of this review. Figure 1. The scores (T) represent a low-dimensional plane that
closely approximates X, that is, the swarm of points. A scatter
Metabonomic studies typically constitute a set of controls
plot of the first two score vectors (t1-t2) provides a summary,
and treated samples, including additional knowledge of the
or overview, of all observations or samples in the data table.
samples, for example, dose, age, gender, and diet. In these
Groupings, trends, and outliers are revealed. The position of
situations, a more focused evaluation and analysis of the data
each object in the model plane is used to relate objects to each
is possible. That is, rather than asking the question “What is
other. Hence, objects that are close to each other have a similar
there?”, one can start to ask, “What is its relation to?” or “What
multivariate profile, given the K-descriptors. Conversely, objects
is the difference between?”. In modeling, this additional
that lie far from each other have dissimilar properties.
knowledge constitutes an extra data table, that is, a Y matrix.
Analogous to the scores, the loading vectors (p1,p2) define
Partial least-squares3 (PLS) and Orthogonal-PLS15-19 (OPLS)
the relation among the measured variables, that is, the columns
represent two modeling methods for relating two data tables.
in the X matrix. A scatter plot, also known as the loading plot
The Y data table can be both quantitative (e.g., age, dose
shows the influence (weight) of the individual X-variables in
concentration) and qualitative (e.g., control/treated) data.
the model. An important feature is that directions in the score
plot correspond to directions in the loading plot, for example,
Chemometric Approach to Metabonomic Studies
for identifying which variables (loadings) separate different
1. Step 1: Define the Aim. It is important to formulate the groups of objects (the scores). This is a powerful tool for
objectives and goals of the metabonomic study. A number of understanding the underlying patterns in the data. Hence,
questions have to be answered and/or taken into consideration projection-based methods represent a solid basis for metabo-
in both the design of study as well as in the evaluation of the nomic analysis.
outcome. For example, What is previously known? What The part of X that is not explained by the model forms the
additional information is needed? How to reach the objectives; residuals (E) and represents the distance between each point
thta is, what experiments are needed and how to perform them? in K-space and its projection on the plane. The scores, loadings,
If these questions cannot be answered, there is no point to and residuals together describe all of the variation in X.
continue.
2. Step 2: Selection of Objects. The selection of the objects X ) TPT + E ) t1p1T + t2p2T + E
(e.g., samples, individuals) needs to span the experimental
domain in a balanced and systematic manner. To be able to 2.2. Multivariate Design. The need and usefulness of
do this, we have to characterize the objects with both measured experimental design in complex systems should be emphasized,
and observed descriptors. This often includes setting up specific because it creates a controlled setting of the environment, even
inclusion and exclusion criteria for the study, such as age span though most of the variation between the different objects is
(e.g., 18-45 years), body mass index (e.g., 20-30), medicinal uncontrolled. Multivariate design (MVD)21,22 is a combination
chemistry profiles (e.g., lipids, glucose), gender, tobacco habits, of multivariate characterization (MVC),23-25 principal compo-
and use of drugs. In addition to those criteria, additional nent analysis (PCA), and Statistical Experimental Design (SED)
information regarding each object is collected by question- to select a diverse set of objects that represents all objects, that
naires that include life style factors, food and drinking habits, is, spans the variation. There is a number of different experi-
social situation, and so on. This collected information repre- mental designs that can be applied to span the variation in a
470 Journal of Proteome Research • Vol. 6, No. 2, 2007

Chemometrics in Metabonomics reviews
Figure 1. A principal component analysis (PCA) model approximates the variation in a data table by a low dimensional model plane.
This model plane provides a score plot, where the relation among the observations or samples in the model plane is visualized, for
example, if there are any groupings, trends, or outliers. The loading plot describes the influence of the variables in the model plane,
and the relation among them. An important feature is that directions in the score plot correspond to directions in the loading plot, and
vice versa.
Figure 2. Four objects are selected according to a multivariate design that spans the biological variation.
systematic way and obtain well-balanced data. The most Dumas et al.30assessed the analytical reproducibility of
commonly used are factorial designs6 and D-optimal design26 human urine samples characterized by H-NMR in the INTER-
that fulfill the criteria of balanced data and orthogonality. In MAP study. INTERMAP was launched in 1996 to investigate
MVD, the principal component model scores, for example, t1 the relationship of multiple dietary variables to blood pressure.
and t2 are used to select the objects, see Figure 2. The selection The conclusion was that most errors are due to specimen
is based on diversity between the objects. handling inhomogeneity.
3. Step 3: Sample Preparation and Characterization. In Multiple factor analysis (MFA) was used by Dumas et al.31
metabonomics, it is important to keep the experimental and to integrate NMR and MS data for metabolic fingerprinting on
biological variation at a minimum. At the same time, the cattle treated with anabolic steroids. Only minor overlap was
metabolic analysis should be global, quantitative, robust, found in the correlation structure between the MS and NMR
reproducible, accurate, and interpretable. In addition, the variables. They underline the relation of a multivariate profile
physicochemical diversity of metabolites (amino acids, fatty not only to the biological information content, but also to its
acids, carbohydrates, and organic acids) raises problems for inherent signature from the analytical instrument used.
extraction and working procedures for different analytical 4. Step 4: Evaluation of the Collected Data. In contrast to
techniques. Here, statistical design of experiments represents an 1H-NMR spectrum, data collected from hyphenated instru-
an important strategy to systematically investigate factors and ments such as GC-MS, LC-MS, and UPLC-NMR must be
optimize the experimental protocols. Typical working proce- processed before multivariate analysis. The reason is the two-
dures for NMR spectroscopy for biofluids and tissue extraction dimensional nature (e.g., chromatogram/mass spectra) of the
are found in Appendix 4, in the SMRS Policy document.27 For data for each sample. Curve resolution or deconvolution
GC-MS, see refs 28 and 29. methods are mainly applied for data processing32-36 that result
Journal of Proteome Research • Vol. 6, No. 2, 2007 471

Figure 3. In the score plot (left panel), the confidence interval is defined by the Hotelling’s T2 ellipse (95% confidence interval), and
observations outside the confidence ellipse are considered outliers. Outliers can also be detected by the distance to model parameter,
DModX, based on the model residuals (right panel).
in a multivariate profile for each sample. Since a variable in a Interesting individual observations, such as outliers, can be
data table should define the same property over all samples, examined and interpreted by the contribution plot.46 It displays
variability in NMR peak shifts cause problems for statistical the weighted difference between the observation and the model
modeling. Because of this, a multitude of different peak center. Hence, we can identify what is unique (deviating) for
alignment methods have been developed.37,38 Variability of an observation compared to “normality”. Similarly, the con-
chemical shifts in H-NMR spectra of biofluids, for example, due tribution plot can also be used for comparing different obser-
to pH variation, metal-ion concentrations, and chemical ex- vations.
change phenomena, has spurred the development of bucketing PCA modeling was used to assess the statistical differentia-
and peak alignment methods. A commonly used method tion between the groups, and the covariance loadings plot for
involves bucketing the data, where signal integration within biochemical interpretation. One example is the paper by Akira
spectral regions is performed.39 An alternative approach has et al.47 wherein the biochemical changes between hypertensive
been to use automatic peak alignment methods to resolve the rats and their normotensive controls to provide insight into
problem of peak position variation. Stoyanova et al.40 removed blood pressure regulation was investigated. The design study
the positional noise using PCA. Forshed et al.41,42 and Lee et included six male rats from each class, and urine was sampled
al.43 have applied genetic algorithms to align segments of twice at 12 and 26 weeks of age. PCA have been frequently
spectra to determine the misalignment across a series of NMR applied in the evaluation of metabonomic data and should be
spectra. Cloarec et al.18 applied the OPLS method to evaluate the method of choice for obtaining an overview, find clusters,
the influence of typical peak position variation on the robust- and to identify outliers. For a few different examples on
ness of pattern recognition methods and demonstrated that applications see refs 48-55.
the inclusion of variable peak position can be beneficial and Haluska and Powers56 have discussed the draw back of
lead to useful biochemical information. Typically, alignment spectral noise when evaluating the data by PCA and suggest
methods rely upon having a master or reference profile. simply removing the noise regions by setting a threshold and
Projection-based methods are sensitive to scaling of the only use the signals above the spectral noise in the PCA.
variables. Scaling of variables changes the length of each axis Another approach to this would be to utilize the prior knowl-
in the K-dimensional space. The primary objective of scaling edge gained in “Study Design”, which gives us the ability to
is to reduce the noise in the data, and thereby enhance the separate the observations in at least two different classes, and
information content and quality. Column centring, whereby thereby use more advanced multivariate methods such as
the mean trajectory is removed from the data, is followed by SIMCA, PLS-DA, and/or OPLS-DA.
either no scaling or pareto scaling of the variables. Pareto
scaling is recommended for metabonomic data and is done Class Specific Studies
by dividing each variable by the square root of its standard Most of the published papers within the field are dealing
deviation. with classification problems such as disease diagnosis or treated
Principal component analysis is used to get an overview of versus control, that is, to identify a group of control observa-
the multivariate profiles. Examining the scatter plot of the first tions and another group of observations known to have a
two score vectors (t1-t2) reveals the homogeneity of the data, specific disease. In a number of papers, several classes can be
any groupings, outliers, and trends. Strong outliers are found identified, but in all of these papers, the evaluation has been
as deviating points in the scatter plot. The Hotelling’s T2 region, made as a two-class case.
shown as an ellipse in Figure 3 (left panel), defines the 95% Two-class problem: Disease and control observations define
confidence interval of the modeled variation.44 Outliers may two separate classes.
also be detected in the model residuals. The distance to model One-class problem: Only disease observations define a class;
plot45 (DModX) can be used and is a statistical test for detecting control samples are too heterogeneous, for example, due to
outliers based on the model residual variance; see Figure 3 other variations caused by diseases, gender, age, diet, lifestyle,
(right panel). genes, unknown factors, and so on.

Figure 4. Illustration of SIMCA classification. In the left panel, the one-class classifier is shown, referred to as the asymmetric case. In
the right panel, the SIMCA classification is shown with two classes, separately modeled by PCA.
Soft Independent Modeling of Class Analogy (SIMCA). The

SIMCA57method is a supervised classification method based on
PCA. The idea is to construct a separate PCA model for each
known class of observations. These PCA models are then used
to assign the class belonging to observations of unknown class
origin by the prediction of these observations into each PCA
class model where the boundaries have been defined by the
95% confidence interval. Observations that are poorly predicted
by the PCA class model, hence, have large residuals, are
classified being outside the PCA model, and do not belong to Figure 5. Class information can also be used to construct an
the class. additional matrix, hereinafter called the Y matrix, consisting of
a discrete ‘dummy’ variable where [1]/[0] indicate the class
The SIMCA model, as shown in Figure 4 (left panel),
belonging.
illustrates only one class of observations with strong homoge-
neity and is well-modeled by PCA. This is commonly referred
a quantitative relationship between two data tables X and Y is
to as the asymmetric case. In Figure 4 (right panel), there are
sought between a matrix, X, usually comprising spectral or
two homogeneous classes of observations, each separately
chromatographic data of a set of calibration samples, and
modeled by PCA. New observations are predicted into each
another matrix, Y, containing quantitative values, for example,
model, and assigned as belonging to either of the classes, none
concentrations of endogenous metabolites. PLS can also be
of the classes, or both of the classes.
used in discriminant analysis, that is, PLS-DA. The Y matrix
The SIMCA method is recommended to be use for the one then contains qualitative values, for example, class belonging,
class case, when we have one well-defined class of objects and gender, and treatment of the samples. The PLS model can be
all other objects are inhomogeneous, that is, asymmetric case. expressed by
Another situation when SIMCA is the method of choice is for
classification if no interpretation is needed.
Model of X: X ) TPT + E
Dumas et al.58 have evaluated the anabolic treatment sig-
nature in cattle urine using NMR by an array of different Model of Y: Y ) TCT + F
statistical methods such as, ANOVA, LDA, and SIMCA clas-
sification for a two-class case. This is a typical case when SIMCA PLS models are negatively affected by systematic variation in
can be used, since only separation between cattle treated with the X matrix that is not related to the Y matrix, that is, not part
steroids and nontreated cattle is initially required and less focus of the joint correlation structure between X-Y. This leads to
is on interpretation. some pitfalls regarding interpretation and has potentially major
A few different examples wherein SIMCA have been applied implications in our selection of metabolite biomarkers, for
are given in refs 59-61, However, if we have information from example, positive correlation patterns can be interpreted as
the study design regarding classes (sick/healthy, treated vs negligible or even become negative.
nontreated, etc.), it is recommended to use other supervised Wang et al.64,65 used LC/MS for profiling of the plasma
methods such as PLS-DA in the two-class cases (or multiple phospholipids in Type 2 diabetes mellitus (DM2). By PLS-DA,
class cases) and/or preferably OPLS-DA19 to facilitate the it was possible not only to differentiate the DM2 from the
interpretation (Figure 5). controls group, but also to identify possible biomarkers. A
Partial Least-Squares (PLS) Method by Projections to number of examples applying PLS-DA have been published;
Latent Structures. PLS3,4,62,63 is a method commonly used where for example, see refs 66-70.

Figure 6. A geometrical illustration of the difference between the PLS-DA and OPLS-DA models. In the left panel, the PLS components
cannot separate the between-class variation from the within-class variation, and the resulting PLS component loadings mixes both
types of variations. In the right panel, the OPLS components are able to separate these two different variations. Component 1 (t1p) is
the predictive component and displays the between-class ([blue circles], [yellow squares]) variation of the samples. The corresponding
loading profile can be used for identifying variables important for the class separation. Component 2 (t2o) is the Y-orthogonal component
and models the within group (within-class) variation.
In Brindle et al.,71 PLS-DA was applied together with a Stella et al.74 have illustrated the use OPLS for characteriza-
multivariate preprocessing filter called orthogonal signal cor- tion of metabolic profile due to different diets and thereby
rection (OSC) for developing a diagnostic tool for predicting identified difference in metabolic pattern between low-meat
the severity of coronary heart disease based on NMR spectral diet and vegetarian diet. This is the first systematic study
profiles of human serum. The OSC filter removes the uncor- reported on the dietary effects on the metabolism.
related signals resulting in information of the within-class
variation. Wagner et al. report the use of this, in a paper72 Dynamic Studies
wherein OSC was applied to investigate the background Metabonomic studies that involve the quantification of the
information, which was not due to the exposure of the dynamic metabolic response are best evaluated using sequen-
compound acetaminophen. As stated by Wagner et al., the OSC tial sampling over an appropriate time course. The evaluation
component surprisingly provided an additional classification of human biofluid samples is further complicated by a high
of male and females. These observations lead us to discuss the degree of normal physiological variation caused by genetic and
OPLS method. lifestyle differences. Dynamic sampling makes it possible to
The Orthogonal-PLS Method (OPLS). The OPLS15 method evaluate and handle the different types of variations such as
is a recent modification of the PLS method.3 The main idea of individual differences in metabolic kinetics, circadian rhythm,
OPLS is to separate the systematic variation in X into two parts, and fast and slow responders.
one that is linearly related to Y and one that is unrelated Dynamic Sampling. Biological processes are dynamic by
(orthogonal) to Y. This partitioning of the X-data facilitates nature, that is, there is a temporal progression. Some problems
model interpretation and model execution on new samples.15,19 are caused by quick and slow responders following intervention
The OPLS model comprises two modeled variations, the or treatment.
Y-predictive (TpPpT) and the Y-orthogonal (ToPoT) components. For this reason, the study design is laid out as sequential
Only the Y-predictive variation is used for the modeling of Y samples over an appropriate time course to capture individual
(TpCpT). trajectories. Sampling period and interval are based on the
expected or known pharmaco-kinetics of the expected effect.
Model of X: X ) TpPpT + ToPoT + E
In other words, statistical experimental design is used to
Model of Y: Y ) TpCpT + F maximize the information content and increase the chances
of capturing all possible variations of responses. This allows
E and F are the residual matrices of X and Y, respectively. OPLS flexibility to the subsequent analysis and an unbiased evalu-
can, analogously to PLS-DA, be used for discrimination (OPLS- ation of each individual’s kinetic profile. This also implies that
DA); see, for instance, ref 19. In Figure 6, the advantages with the often assumed control (or pre-dose) and treated modeling
OPLS-DA compared to PLS-DA are exemplified. The between- approach is not optimal, as it fails to take into account the
class variation and the within-class variation are separated by individual dynamics, for example, slow and fast responders.
OPLS-DA but not by PLS-DA, and this facilitates the OPLS-DA In addition, for dynamic studies, the traditional control group
model interpretation. does not exist. Instead, each individual (object) is its own
In Cloarec et al.,73 a combined covariation and correlation- reference control.
loading plot provided additional information on the physico- Coen et al.75 retrieved gene expression data and performed
chemical variations in the data. This was done by means of different types of NMR experiments of liver tissue from rats
coloring each covariance loading by its correlation value to class dosed with acetaminophen. The design study included 70 rats,
separation. 5 time points, and 4 different dose levels. Statistical analysis

over time; see Figure 8. For multiple objects (e.g., control rats),
it is possible to establish an average trajectory with upper and
lower limits based on standard deviations. These indicate the
normal development of the object, for example, control rats.
The established control charts from the model can be used to
monitor the development of new objects and is used to detect
deviations from normality, for example, effect of a toxin or drug.
Observed deviations from normality can be interpreted by
means of contribution plots. Batch modeling is based on the
assumption that a control group of objects is followed over the
Figure 7. In batch modeling, the data is organised as an X-matrix same time period as the treated group.
containing blocks of rows where each block represents an object A drawback with batch modeling is that all study objects
(e.g., a rat). Each row in a block represents the multivariate profile must have a similar metabolic and response rate; we cannot
of an observation (e.g., the NMR spectral shifts) at a specific time have slow and fast responders in the same model.
point. The corresponding row in the Y-matrix contains the
dynamics (e.g., the time point). This is followed by an PLS or
OPLS model to extract variation from the X-matrix related to the Multi-“omics” Studies
dynamics of the system.
Lindon et al.87 discussed the huge problem of combining and
using the data obtained in metabonomics studies from an array
and biochemical interpretation were performed using ANOVA
of analytical chemical methods as well as metabolic profiles
and PCA modeling. In Smilde et al.,76 the ANOVA-simultaneous,
from different compartments, and further on, to use these
component analysis (ASCA) method is described as a direct
combined information for diagnosis, understanding physiologi-
generalization of analysis of variance (ANOVA) for univariate
cal variation, drug therapy monitoring, as well as effect evalu-
data to the multivariate case. ASCA was demonstrated on an
ation. If this can be handled, then metabonomics will be an
intervention study examining the effect of vitamin C on the
integral part in the development of personalized medicine.
development of OA in guinea pigs.
Attempts to approach this type of problems have been made
Dieterle et al.77 applied metabonomics to a compound
by Klenø et al.,88 by studying combined genomics, proteomics,
ranking toxicity study where rat urine was collected and
and metabonomics data. Protein and endogenous metabolites
characterized using NMR spectroscopy. The design study
were identified as altered in rats treated with hydrazine
included controls, multiple dose groups, and two time points.
compared with untreated controls. The design study included
Arrays of classical end points were also assessed besides the
a control group and two dosed groups with 10 rats in each.
NMR profile. Statistical modeling included PCA and PLS
Statistical evaluation and interpretation were performed sepa-
modeling.
rately and provided an insight into the underlying biochemistry.
Bollard et al.78 presented an additional approach to dynami-
Rantalainen et al.89 demonstrated a novel approach using the
cal studies by using the metabolic principal component
OPLS method to integrate 2D-DIGE proteomic and NMR
trajectories to highlight the maximum effect. The data was
metabolic data from a human tumor xenograft mouse model
scaled by scaling to maximum aligned and reduced trajectories
of prostate cancer.
(SMART), to remove differences in individual starting positions
of the objects and varying magnitudes of effects and thereby Yap et al.90 used partial least-squares-based cross-correlation
facilitate the comparison between the objects. This type of the variation in the liver and plasma profiles and found that
SMART-PCA is further illustrated in Keun et al.79 Other ex- an increase in liver lipids correlated to a decrease in plasma
amples of dynamic studies wherein PCA have been used or the lipids; this method will work nicely for two compartments.
evaluation of data have been handled as a two-class problem However, the complexity will rapidly increase with the number
with PLS-DA are given in refs 80-84. of compartments and “omic” data. This problem has been
A different approach to handling dynamic metabolic data stressed by Craig et al.,91 wherein the difficulties of combing
was presented by Antti et al.85 who studied the time-dependent and interpreting multiomic are discussed.
effect of hydrazine on the metabolic profiles for rats. Urine One way of approaching the multicompartment problem is
samples were collected from dosed rats and matched control presented by Martin et al.92 They have applied H-PCA followed
animals at several time points up to 168 h post-dose. The by OPLS-DA. This resulted in a model where both inter- and
samples were analyzed by NMR and evaluated by batch intracompartment covariance and correlations easily can be
modeling. evaluated and thereby increase the understanding of biochemi-
Batch Modeling. Batch modeling86 is routinely being used cal mechanisms and relations between different compartments.
for analysis of industrial batch process data. A batch process Hierarchical PCA. The idea behind hierarchical PCA is to
has a finite duration in time, in contrast to a continuous block the variables to improve transparency and interpret-
process. By analogy, batch modeling methods are used in ability.93-95 This method operates on two or more levels. On
metabonomic studies to model the time dependency or each level, standard PCA scores and loading plots, as well as
dynamics of biological processes,for example, the evolution of residuals and their summaries, such as DModX, are used for
a toxic substance in rats. Data collected from such studies interpretation. The procedure for two levels can be described
produce a three-way data table where each dimensionality as follows (see Figure 9): In the first step in this case is to divide
represents objects (e.g., rat urine or plant extract samples), the large matrix into conceptually meaningful blocks and make
variables (e.g., NMR shifts, m/z), and sample time points a separate Principal Component Analysis for each matrix. In
(Figure 7). Batch modeling is based on modeling two levels, the next step, the principal components (scores T) from each
the observation level and the batch level. The observation level of these models become the new variables (“super variables”)
shows the dynamics of the biological process of each object describing the systematic variation from each block. In the final

Figure 8. Batch control charts can be constructed from a PLS or OPLS batch model score vectors. The average score trajectory (for
each component) with upper and lower control limits (based on standard deviations) indicates the normal dynamic trajectory for a
batch. The control chart can be used for detecting deviations from normality.
study is PCA. PCA is always recommended as a starting point

for analyzing multivariate data and will rapidly provide an
overview of the information hidden in the data. Unfortunately,
in a majority of the reviewed papers, the PCA method is the
only tool applied. Often additional information can be extracted
by using more advanced multivariate methods. In a few papers,
PLS-DA and/or OPLS-DA have been used for modeling two
classes of data to increase the class separation, simplify
interpretation, and find potential biomarkers. For the two-class
problem, OPLS-DA is recommended to obtain a clearer and
more straightforward interpretation. It can also provide an
understanding of the interclass variation.
A few papers also evaluate dynamic data, and one of the
approaches used is batch modeling, a PLS-based method. A
major drawback with this method is the assumption that all
study objects have similar starting profiles and dynamics, for
Figure 9. H-PCA is shown from the bottom to the top. At bottom
example, responding at the same metabolic rate to a treatment.
of figure, the data matrix is divided into blocks. A separate PCA
model is calculated for each block, and the PCA score compo-
This problem may be controlled by using a multivariate design
nents from each model are then combined to form a new matrix, for selecting the objects and thereby introduce a “controlled”
summarizing all blocks. This new block of data is then analyzed biological variation.
by a PCA. There is a general lack in applying statistical experimental
design (SED) to ensure balanced data and to have a defined
step, a PCA model fitted to this data and the hierarchical PCA experimental domain. The problems with multiomics data and
model is established, see Figure 8. combining data from different compartments have been
The interpretation of a hierarchical model has to be done discussed in a few papers. In one of the papers, the interesting
in two steps. First, the loading plots of the hierarchical model multivariate approach by combining H-PCA with OPLS is
reveal which of the blocks are most important for any groupings suggested. This combination results in a straightforward way
that can be seen in the hierarchical score plot. Second, the to handle the data as well as simplifying the interpretation,
loading plots for the blocks of interest are studied on the lower wherein different compartments are combined.
level, and in the corresponding loading plot, the original A future outlook for chemometrics in metabonomics is that
variables of importance can be identified. The Hierarchical PCA the benefits of statistical experimental design in conjunction
is easily extended to one type of hierarchical PLS or PLS/DA with more focused modeling methods such as PLS and OPLS
by adding a Y (response/discriminate) matrix on the upper become more widely known and applied to a much greater
level. The interpretation is done in analogy with PLS or PLS/ extent, not only for the two-class problems, but also for
DA on the upper level and as in H-PCA on the lower level. dynamic studies. However, it is likely to take some time until
a fully integrated multivariate approach is published, based on
Discussion And Future Remarks the chemometric philosophy.
In this review, we have provided an overview of how the
References
underlying philosophy of chemometrics can be integrated
throughout metabonomic studies. We have been able to (1) Robertson, D. G.; Reily, M. D.; Baker, J. D. Metabonomics in
illustrate each separate step with different examples from the preclinical drug development. Expert Opin. Drug Metab. Toxicol.
2005. 1 (3), 363-376.
literature showing the state of the art. The most common (2) Jackson J. E. A Users Guide to Principal Components; Wiley: New
chemometrical tool used in the evaluation of a metabonomic York, 1991.

(3) Wold, S.; Ruhe, A.; Wold, H.; Dunn, W. J., III The Collinearity (30) Dumas, M. E.; Maibaum, E. C.; Teague, C.; Ueshima, H.; Zhou,
problem in linear regression. The partial least squares approach B.; Lindon, J. C.; Nicholson, J. K.; Stamler, J.; Elliott, P.; Chan, Q.;
to generalized inverses. SIAM J. Sci. Stat. Comput. 1984, 5 (3), Holmes, E. Assessment of analytical reproducibility of 1H NMR
735-743. spectroscopy based metabonomics for large-scale epidemiologi-
(4) Wold, S.; Martens, H.; Wold, H. Lecture Notes in Mathematics Proc. cal research: the INTERMAP Study. Anal. Chem. 2006, 78, 2199-
Conf. Matrix pencils, Piteå, Sweden; Springer-Verlag: Heidelberg, 208.
1983. (31) Dumas, M. E.; Canlet, C.; Debrauwer, L.; Martin, P.; Paris, A.
(5) Lundstedt, T.; Seifert, E.; Abramo, L.; Thelin, B.; Nyström, A.; Selection of biomarkers by a multivariate statistical processing
Pettersen, J.; Bergman R. Experimental design and optimization. of composite metabonomic data sets using multiple factor
Chemom. Intell. Lab. Syst. 1998, 42, 3-40. analysis. J. Proteome Res. 2005, 4, 1485-1492.
(6) Box, G. E. P.; Hunter, W. G.; Hunter, J. S. Statistics for Experiment- (32) Jonsson, P.; Gullberg, J.; Nordström, A.; Kowalczyk, M.; Sjöström,
ers; John Wiley & Sons: New York, 1978. M.; Moritz, T. A strategy for extracting information from large
(7) Eriksson, L.; Johansson, E.; Kettaneh Wold, N.; Wikström, C.; series of non-processed complex GC/MS data. Anal. Chem. 2004,
Wold, S. Design of Experimentssprinciples and Applications, 76, 1738-1745.
Umetrics AB, Umeå, Sweden, 1996. (33) Jonsson, P.; Bruce, S. J.; Moritz, T.; Trygg, J.; Sjöström, M.; Plumb,
(8) Antti, H.; Ebbels, T. M. D.; Keun, H. C.; Bollard, M. E.; Beckonert, R.; Granger, J.; Maibaum, E.; Nicholson, J. K.; Holmes, E.; Antti,
O.; Lindon, J. C.; Nicholson, J. K.; Holmes, E. Statistical experi- H. Extraction, interpretation and validation of information for
mental design and partial least squares regression analysis of comparing samples in metabolic LC/MS data sets. Analyst 2005,
biofluid metabonomic NMR and clinical chemistry data for 130, 701-707.
screening of adverse drug effects. Chemom. Intell. Lab. Syst. 2004, (34) Halket, J. M.; Przyborowska, A.; Stein, S. E.; Mallard, W. G.; Down,
73, 139-149. S.; Chalmers R. A. Deconvolution gas chromatography mass
(9) Hotelling, H. The most predictable criterion. J. Educ. Psychol. spectrometry of urinary organic acidssPotential for pattern
1935, 26, 139-142. recognition and automated identification of metabolic disorders.
(10) Greenacre, M. J. Theory and Applications of Correspondence Rapid Commun. Mass Spectrom. 1999, 13, 279-284.
Analysis; Academic Press: London, 1984. (35) Shen, H. L.; Grung, B.; Kvalheim, O. M.; Eide. I. Automated curve
(11) Bishop, C. M. Neural Networks for Pattern Recognition; Oxford resolution applied to data from multi-detection instruments.
University Press: Oxford, U.K., 1996. Anal. Chim. Acta 2001, 446 (1-2), 313-328.
(12) Wythoff, B. J. Backpropagation neural networkssa tutorial. (36) Jonsson, P.; Sjövik Johansson, E.; Wuolikainen, A.; Lindberg, J.;
Chemom. Intell. Lab. Syst. 1993, 18 (2), 115-155. Schuppe-Koistinen, I.; Kusano, M.; Sjöström, M.; Trygg, J.; Moritz,
(13) Sivia, D. S. Data Analysis: A Bayesian Tutorial; Oxford University T.; Antti, H. Predictive metabolite profiling applying hierarchical
Press: Oxford, U.K., 1996. multivariate curve resolution to GC-MS data-A potential tool for
(14) Rabiner, L. R.; Juang, B. H. An Introduction to Hidden Markov multi-parametric diagnosis. J. Proteome Res. 2006, 5 (6), 1407-
Models. IEEE ASSP Mag., January, 1986. 1414.
(15) Trygg, J.; Wold S. Orthogonal projections to latent structures (O- (37) Torgrip, R. J. O.; Aberg. M.; Karlberg, B.; Jacobsson, S. P. Peak
PLS). J. Chemom. 2002, 16, 119-128. alignment using reduced set mapping. J. Chemom.2003, 17 (11),
573-582.
(16) Trygg, J. O2-PLS for qualitative and quantitative analysis in
(38) Vogels, J. T. W. E.; Tas, A. C.; van den Berg, F.; van der Greef, J.
multivariate calibration. J. Chemom. 2002, 16, 283-293.
A new method for classification of wines based on proton and
(17) Trygg, J.; Wold, S. O2-PLS, a two-block (X-Y) latent variable
carbon-13 NMR spectroscopy in combination with pattern
regression (LVR) method with an integral OSC filter. J. Chemom.
recognition techniques. Chemom. Intell. Lab. Syst. 1993, 21, 2-3,
2003, 17, 53-64.
249-258.
(18) Cloarec, O.; Dumas, M. E.; Trygg, J.; Craig, A.; Barton, R. H.;
(39) Holmes, E.; Nicholson, J. K.; Nicholls, A. W.; Lindon, J. C.; Connor,
Lindon, J. C.; Nicholson, J. K.; Holmes, E. Evaluation of the
S. C.; Polley, S.; Connelly, J. The identification of novel biomarkers
orthogonal projection on latent structure model limitations
of renal toxicity using automatic data reduction techniques and
caused by chemical shift variability and improved visualization
PCA of proton NMR spectra of urine. Chemom. Intell. Lab. Syst.
of biomarker changes in H-1 NMR spectroscopic metabonomic
1998, 44, 245-255.
studies. Anal. Chem. 2005, 77 (2), 517-526. (40) Stoyanova, R.; Nicholls, A. W.; Nicholson, J. K.; Lindon, J. C.;
(19) Bylesjö, M.; Rantalainen, M.; Cloarec, O.; Nicholson, J. K.; Holmes, Brown, T. R. Automatic alignment of individual peaks in large
E.; Trygg, J. OPLS Discriminant AnalysissCombining the strengths high-resolution spectral data sets. J. Magn. Reson. 2004, 170, 329-
of PLS-DA and SIMCA classification. J. Chemom., 2006, in press. 35.
(20) Kvalheim, O. M. The latent variable. Chemom. Intell. Lab. Syst. (41) Forshed, J. R.; Torgrip, J. O.; Aberg, K. M.; Karlberg, B.; Lindberg,
1992, 14, 1-3. J.; Jacobsson, S. P. A comparison of methods for alignment of
(21) Wold, S.; Sjöström, M.; Carlson. R.; Lundstedt, T.; Hellberg, S.; NMR peaks in the context of cluster analysis. J. Pharm. Biomed.
Skageberg, B.; Wikström, C. Multivariate design. Anal. Chim. Acta Anal. 2005, 38, 824-832.
1986, 17, 191. (42) Forshed, J.; Schuppe-Koistinen, I.; Jacobsson, S. P. Peak alignment
(22) Carlson, R.; Lundstedt, T. Scope of organic synthetic reactions. of NMR signals by means of a genetic algorithm. Anal. Chim.
Multivariate methods for exploring the reaction space. An Acta 2003, 487, 189-199.
example of the Willgerodt-Kindler reaction. Acta Chem. Scand. (43) Lee, G. C.; Woodruff, D. L. Beam search for peak alignment of
B 1987, 41, 164. NMR signals. Anal. Chim. Acta 2004, 513, 413-416.
(23) Carlson, R.; Lundstedt, T.; Albano, C. Screening of suitable (44) Hotelling, H. The generalization of Student’s ratio. Ann. Math.
solvents for organic synthesis, strategies for solvent selection. Acta Stat. 1931, 2, 360-378.
Chem. Scand, B 1984, 39, 79. (45) Eriksson, L.; Johansson, E.; Kettaneh Wold, N.; Wold, S. Multi
(24) Sandberg, M.; Sjöström, M.; Jonsson, J. A multivariate charac- and Megavariate Data Analysis; Umetrics AB, Umeå, Sweden,
terization of tRNA nucleosides. J. Chemom. 1996, 10, 493-508. 2001.
(25) Oprea, T. I.; Gottfries, J. Chemography: The art of navigating in (46) Miller, P.; Swanson, R. E.; Heckler, C. E. Contribution plots: A
chemical space. J. Comb. Chem. 2001, 3 (2), 157-166. missing link in multivariate quality control. Appl. Math. Comp.
(26) deAguiar, P. F.; Bourguignon, B.; Khots, M.; Massart, D. L.; Sci. 1998, 8 (4), 775-792.
PhanThanLuu R. D-optimal designs. Chemom. Intell. Lab. Syst. (47) Akira, K.; Imachi, M.; Hashimoto, T. Investigations into biochemi-
1995, 30 (2), 199-210. cal changes of genetic hypertensive rats using 1H nuclear
(27) The Standard Metabolic Reporting Structure, Version 2.3, http:// magnetic resonance-based metabonomics. Hypertens. Res. 2005,
www.smrsgroup.org/, Jan uary 13, 2006. 28, 425-430.
(28) Gullberg, J.; Jonsson, P.; Nordström, A.; Sjöström, M.; Moritz, T. (48) Yang, J.; Xu, G.; Zheng, Y.; Kong, H.; Pang, T.; Lv, S.; Yang, Q.
Design of experiments: an efficient strategy to identify factors Diagnosis of liver cancer using HPLC-based metabonomics
influencing extraction and derivatization of Arabidopsis thaliana avoiding false-positive result from hepatitis and hepatocirrhosis
samples in metabolomic studies with gas chromatography/mass diseases. J. Chromatogr., B: Anal. Technol. Biomed. Life Sci. 2004,
spectrometry. Anal. Biochem. 2004, 331, 283-295. 813, 59-65.
(29) Jiye, A.; Trygg, J.; Gullberg, J.; Johansson, A. I.; Jonsson. P.; Antti, (49) Lenz, E. M.; Bright, J.; Wilson, I. D.; Morgan, S. R.; Nash, A. F. A
H.; Marklund, S. L.; Moritz, T.; Extraction and GC/MS. Analysis 1H NMR-based metabonomic study of urine and plasma samples
of the human blood plasma metabolome. Anal. Chem. 2005, 77, obtained from healthy human subjects. J. Pharm. Biomed. Anal.
8086-8094. 2003, 33, 1103-1115.

(50) Wang, Y.; Bollard, M. E.; Keun, H.; Antti, H.; Beckonert, O.; Ebbels, (69) Constantinou, M. A.; Papakonstantinou, E.; Spraul, M.; Sevastia-
T. M.; Lindon, J. C.; Holmes, E.; Tang, H.; Nicholson, J. K. Spectral dou, S.; Costalos, C.; Koupparis, M. A.; Shulpis, K.; Tsantili-
editing and pattern recognition methods applied to high-resolu- Kakoulidou, A.; Mikros, E. H-1 NMR-based metabonomics for the
tion magic-angle spinning 1H nuclear magnetic resonance diagnosis of inborn errors of metabolism in urine. Anal. Chim.
spectroscopy of liver tissues. Anal. Biochem. 2003, 323, 26-32. Acta 2005, 542, 169-177.
(51) Plumb, R. S.; Stumpf, C. L.; Gorenstein, M. V.; Castro-Perez, J. (70) Yang, J.; Zhao, X.; Liu, X.; Wang, C.; Gao, P.; Wang, J.; Li, L.; Gu,
M.; Dear, G. J.; Anthony, M.; Sweatman, B. C.; Connor, S. C.; J.; Yang, S.; Xu, G. High performance liquid chromatography-
Haselden, J. N. Metabonomics: The use of electrospray mass mass spectrometry for metabonomics: potential biomarkers for
spectrometry coupled to reversed-phase liquid chromatography acute deterioration of liver function in chronic hepatitis B. J.
shows potential for the screening of rat urine in drug develop- Proteome Res. 2006, 5, 554-561.
ment. Rapid. Commun. Mass. Spectrom. 2002, 16, 1991-1996. (71) Brindle, J. T.; Antti, H.; Holmes, E.; Tranter, G.; Nicholson, J. K.;
(52) Robertson, D. G.; Reily, M. D.; Sigler, R. E.; Wells, D. F.; Paterson, Bethell, H. W.; Clarke, S.; Schofield, P. M.; McKilligin, E.;
D. A.; Braden, T. K. Metabonomics: Evaluation of nuclear Mosedale, D. E.; Grainger, D. J. Rapid and noninvasive diagnosis
magnetic resonance (NMR) and pattern recognition technology of the presence and severity of coronary heart disease using 1H-
for rapid in vivo screening of liver and kidney toxicants. Toxicol. NMR-based metabonomics. Nat. Med. 2002, 8, 1439-1444.
Sci. 2000, 57, 326-337. (72) Wagner, S.; Scholz, K.; Donegan, M.; Burton, L.; Wingate, J.; Volkel,
(53) Kim, S. W.; Ban, S. H.; Ahn, C. Y.; Oh, H. M.; Chung, H.; Cho, S. W. Metabonomics and biomarker discovery: LC-MS metabolic
H.; Park, Y. M.; Liu, J. R. Taxonomic discrimination of cyano- profiling and constant neutral loss scanning combined with
bacteria by metabolic fingerprinting using proton nuclear mag- multivariate data analysis for mercapturic acid analysis. Anal.
netic resonance spectra and multivariate statistical analysis. J. Chem. 2006, 78, 1296-1305.
Plant Biol. 2006, 49, 271-275. (73) Cloarec, O. M.; Dumas, E.; Craig, A.; Barton, R. H.; Trygg, J.;
(54) Rasmussen, B.; Cloarec, O.; Tang, H. R.; Staerk, D.; Jaroszewski, Hudson, J.; Blancher, C.; Gauguier, D.; Lindon, J. C.; Holmes, E.;
J. W. Multivariate analysis of integrated and full-resolution H-1- Nicholson, J. Statistical total correlation spectroscopy: An ex-
NMR spectral data from complex pharmaceutical preparations: ploratory approach for latent biomarker identification from
St. John’s wort. Planta Med. 2006, 72, 556-563. metabolic H-1 NMR data sets. Anal. Chem. 2005, 77, 1282-1289.
(55) Chen, H. W.; Pan, Z. Z.; Talaty, N.; Raftery, D.; Cooks, R. G. (74) Stella, C.; Beckwith-Hall, B.; Cloarec, O.; Holmes, E.; Lindon, J.
Combining desorption electrospray ionization mass spectrometry C.; Powell, J.; van der Ouderaa, F.; Bingham, S.; Cross, A. J.;
and nuclear magnetic resonance for differential metabolomics Nicholson, J. K. Susceptibility of human metabolic phenotypes
without sample preparation. Rapid. Commun. Mass Spectrom. to dietary modulation. J. Proteome Res. 2006, 5, 2780-2788.
2006, 20, 1577-1584. (75) Coen, M.; Ruepp, S. U.; Lindon, J. C.; Nicholson, J. K.; Pognan,
(56) Halouska, S.; Powers, R. Negative impact of noise on the principal F.; Lenz, E. M.; Wilson, I. D. Integrated application of transcrip-
component analysis of NMR data. J. Magn. Reson. 2006, 178, 88- tomics and metabonomics yields new insight into the toxicity
95. due to paracetamol in the mouse. J. Pharm. Biomed. Anal. 2004,
(57) Wold, S. Pattern recognition by means of disjoint principal 35, 93-105.
components models. Pattern Recognit. 1976, 8, 127-139. (76) Smilde, A. K.; Jansen, J. J.; Hoefsloot, H. C. J.; Lamers, R. J. A. N.;
(58) Dumas, M. E.; Canlet, C.; Vercauteren, J.; Andre, F.; Paris, A.
van der Greef, J.; Timmerman, M. E. ANOVA-simultaneous,
Homeostatic signature of anabolic steroids in cattle using H-1-
component analysis (ASCA): A new tool for analyzing designed
C-13 HMBC NMR metabonomics. J. Proteome Res. 2005, 4, 1493-
metabolomics data. Bioinformatics 2005, 21, 3043-3048.
1502.
(77) Dieterle, F.; Schlotterbeck, G. T.; Ross, A.; Niederhauser, U.; Senn,
(59) Odunsi, K. R.; Wollman, M.; Ambrosone, C. B.; Hutson, A.;
H. Application of metabonomics in a compound ranking study
McCann, S. E.; Tammela, J.; Geisler, J. P.; Miller, G.; Sellers, T.;
in early drug development revealing drug-induced excretion of
Cliby, W.; Qian, F.; Keitz, B.; Intengan, M.; Lele, S.; Alderfer, J. L.
choline into urine. Chem. Res. Toxicol. 2006, 19, 1175-1181.
Detection of epithelial ovarian cancer using 1H-NMR-based
(78) Bollard, M. E.; Keun, H. C.; Beckonert, O.; Ebbels, T. M.; Antti,
metabonomics. Int. J. Cancer 2005, 113, 782-788.
H.; Nicholls, A. W.; Shockcor, J. P.; Cantor, G. H.; Stevens, G.;
(60) Holmes, E.; Nicholls, A. W.; Lindon, J. C.; Connor, S. C.; Connelly,
Lindon, J. C.; Holmes, E.; Nicholson, J. K. Comparative metabo-
J. C.; Haselden, J. N.; Damment, S. J.; Spraul, M.; Neidig, P.;
Nicholson, J. K. Chemometric models for toxicity classification nomics of differential hydrazine toxicity in the rat and mouse.
based on NMR spectra of biofluids. Chem. Res. Toxicol. 2000, 13, Toxicol. Appl. Pharmacol. 2005, 204, 135-151.
471-478. (79) Keun, H. C.; Ebbels, T. M.; Bollard, M. E.; Beckonert, O.; Antti,
(61) McKee, C. L. G.; Wilson, I. D.; Nicholson, J. K. Metabolic H.; Holmes, E.; Lindon, J. C.; Nicholson, J. K. Geometric trajectory
phenotyping of nude and normal (Alpk : ApfCD, C57BL10J) mice. analysis of metabolic responses to toxicity can define treatment
J. Proteome Res. 2006, 5, 378-384. specific profiles. Chem. Res. Toxicol. 2004, 17, 579-587.
(62) Wold, S.; Eriksson, L.; Sjöström, M. PLS in Chemistry. Encyclo- (80) Ishihara, K.; Katsutani, N.; Aoki, T. A metabonomics study of the
pedia of Computational Chemistry; Schleyer, P. V. R., Ed.; John hepatotoxicants galactosamine, methylene dianiline and clofi-
Wiley & Sons: New York, 1998; pp2006-2016. brate in rats. Basic Clin. Pharmacol. Toxicol. 2006, 99, 251-260.
(63) Wold, S.; Albano, C.; Dunn, W. J., III; Edlund, U.; Esbensen, K.; (81) Williams, R. E.; Lenz, E. M.; Lowden, J. S.; Rantalainen, M.; Wilson,
Geladi, P.; Hellberg, S.; Johansson. E.; Lindberg, W.; Sjöström, I. D. The metabonomics of aging and development in the rat:
M. In Multivariate Data Analysis in Chemistry; NATO ASI Series an investigation into the effect of age on the profile of endog-
C 138, D; Reidel Publ. Co.: Dordrecht, Holland, 1984. enous metabolites in the urine of male rats using 1H NMR and
(64) Wang, C. H.; Kong, W.; Guan, Y. F.; Yang, J.; Gu, J. R.; Yang, S. L.; HPLC-TOF MS. Mol. Biosyst. 2005, 1, 166-175.
Xu, G. W. Plasma phospholipid metabolic profiling and biom- (82) Williams, R. E.; Lenz, E. M.; Rantalainen, M.; Wilson, I. D. The
arkers of type 2 diabetes mellitus based on high-performance comparative metabonomics of age-related changes in the urinary
liquid chromatography/electrospray mass spectrometry and mul- composition of male Wistar-derived and Zucker (fa/fa) obese rats.
tivariate statistical analysis. Anal. Chem. 2005, 77, 4108-4116. Mol. Biosyst. 2006, 2, 193-202.
(65) Wang, C.; Kong, H.; Guan, Y.; Yang, J.; Gu, J.; Yang, S.; Xu, G. (83) Schnackenberg, L. K.; Jones, R. C.; Thyparambil, S.; Taylor, J. T.;
Plasma phospholipid metabolic profiling and biomarkers of type Han, T.; Tong, W.; Hansen, D. K.; Fuscoe, J. C.; Edmondson, R.
2 diabetes mellitus based on high-performance liquid chroma- D.; Beger, R. D.; Dragan, Y. P. An integrated study of acute effects
tography/electrospray mass spectrometry and multivariate sta- of valproic acid in the liver using metabonomics, proteomics, and
tistical analysis. Anal. Chem. 2005, 77, 4108-4116. transcriptomics platforms. OMICS 2006, 10, 1-14.
(66) Dieterle, F.; Ross, A.; Schlotterbeck, G.; Senn, H. Metabolite (84) Bollard, M. E.; Keun, H. C.; Beckonert, O.; Ebbels, T. M. D.; Antti,
projection analysis for fast identification of metabolites in me- H.; Nicholls, A. W.; Shockcor, J. P.; Cantor, G. H.; Stevens, G.;
tabonomics. Application in an amiodarone study. Anal. Chem. Lindon, J. C.; Holmes, E.; Nicholson, J. K. Comparative metabo-
2006, 78, 3551-3561. nomics of differential hydrazine toxicity in the rat and mouse.
(67) Yin, P.; Zhao, X.; Li, Q.; Wang, J.; Li, J.; Xu, G. Metabonomics study Toxicol. Appl. Pharmacol. 2005, 204, 135-151.
of intestinal fistulas based on ultraperformance liquid chroma- (85) Antti, H.; Bollard, M. E.; Ebbels, T.; Keun, H.; Lindon, J. C.;
tography coupled with Q-TOF mass spectrometry (UPLC/Q-TOF Nicholson, J. K.; Holmes, E. Batch statistical processing of H-1
MS). J. Proteome Res. 2006, 5, 2135-2143. NMR-derived urinary spectral data. J. Chemom. 2002, 16, 461-
(68) Ramadan, Z.; Jacobs, D.; Grigorov, M.; Kochhar, S. Metabolic 468.
profiling using principal component analysis, discriminant partial (86) Wold, S.; Kettaneh, N.; Friden, H.; Holmberg, A. Modelling and
least squares, and genetic algorithms. Talanta 2006, 68, 1683- diagnostics of batch processes and analogous kinetic experi-
1691. ments. Chemom. Intell. Lab. Syst. 1998, 44, 331-340.

(87) Lindon, J. C.; Holmes, E.; Nicholson, J. K. Metabonomics: Systems analysis of methapyrilene induced hepatotoxicity in the rat. J.
biology in pharmaceutical research and development. Curr. Opin. Proteome Res. 2006, 5, 1586-1601.
Mol. Ther. 2004, 6, 265-272. (92) Martin, F. P. J.; Wang, Y.; Yap, I. K. S.; Lundstedt, T.; Lek, P.;
(88) Kleno, T. G.; Kiehr, B.; Baunsgaard, D.; Sidelmann, U. G. Lindon, J. C.; Sprenger, N.; Kochhar, S.; Fay, L. B.; Holmes, E.;
Combination of ‘omics’ data to investigate the mechanism(s) of Nicholson, J. K. NMR and UPLC-MS based multi-compartment
hydrazine-induced hepatotoxicity in rats and to identify potential metabonomic investigation of the contribution of different dietary
biomarkers. Biomarkers 2004, 9, 116-138. probiotics to host metabolism. J. Proteome Res., to be submitted
(89) Rantalainen, M.; Cloarec, O.; Beckonert, O.; Wilson, I. D.; Jackson,
for publication.
D.; Tonge, R.; Rowlinson, R.; Rayner, S.; Nickson, J.; Wilkinson,
(93) Wold, S.; Kettaneh, N.; Tjessem, K. Hierarchical multiblock, PLS
R. W.; Mills, J. D,; Trygg, J.; Nicholson, J. K.; Holmes, E. Statistically
integrated metabonomic-proteomic studies on a human prostate and PC models for easier model interpretation and as an
cancer xenograft model in mice. J. Proteome Res. 2006, 5 (10), alternative to variable selection. J. Chemom. 1996, 10 (5-6), 463-
2642-2655. 482.
(90) Yap, I. K.; Clayton, T. A.; Tang, H.; Everett, J. R.; Hanton, G.; J. (94) Eriksson, L.; Johansson, E.; Lindgren. F.; Sjöström, M.; Wold, S.
Provost, P.; Le Net, J. L.; Charuel, C.; Lindon, J. C.; Nicholson, J. Megavariate analysis of hierarchical QSAR data. J. Comput.-Aided
K. An integrated metabonomic approach to describe temporal Mol. Des. 2002, 16, 711-726.
metabolic disregulation induced in the rat by the model hepa- (95) Gunnarsson. I.; Andersson, P. M.; Wikberg, J.; Lundstedt, T.
totoxin allyl formate. J. Proteome. Res. 2006, 5, 2675-2684. Multivariate analysis of G protein-coupled receptors. J. Chemom.
(91) Craig, A.; Sidaway, J.; Holmes, E.; Orton, T.; Jackson, D.; Row- 2003, 17, 82-92.
linson, R.; Nickson, J.; Tonge, R.; Wilson, I.; Nicholson, J. Systems
toxicology: integrated genomic, proteomic and metabonomic PR060594Q

Chemometrics in Metabonomics

Uploaded by

Copyright:

Available Formats

Chemometrics in Metabonomics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chemometrics in Metabonomics

Uploaded by

Copyright:

Available Formats

Chemometrics in Metabonomics

Johan Trygg,† Elaine Holmes,‡ and Torbjo1 rn Lundstedt*,§,|

Received November 11, 2006

We provide an overview of how the underlying philosophy of chemometrics is integrated throughout

Introduction shift). This development generates huge and complex data

470 Journal of Proteome Research • Vol. 6, No. 2, 2007

Journal of Proteome Research • Vol. 6, No. 2, 2007 471

472 Journal of Proteome Research • Vol. 6, No. 2, 2007

Soft Independent Modeling of Class Analogy (SIMCA). The

Journal of Proteome Research • Vol. 6, No. 2, 2007 473

474 Journal of Proteome Research • Vol. 6, No. 2, 2007

Journal of Proteome Research • Vol. 6, No. 2, 2007 475

study is PCA. PCA is always recommended as a starting point

476 Journal of Proteome Research • Vol. 6, No. 2, 2007

Journal of Proteome Research • Vol. 6, No. 2, 2007 477

478 Journal of Proteome Research • Vol. 6, No. 2, 2007

Journal of Proteome Research • Vol. 6, No. 2, 2007 479

You might also like