07 - 03 - 2018 - The Classi
07 - 03 - 2018 - The Classi
07 - 03 - 2018 - The Classi
To cite this article: R. Gil Solsona, C. Boix, M. Ibáñez & J. V. Sancho (2018): The classification
of almonds (Prunus dulcis) by country and variety using UHPLC-HRMS-based untargeted
metabolomics, Food Additives & Contaminants: Part A, DOI: 10.1080/19440049.2017.1416679
Article views: 35
ARTICLE
CONTACT J. V. Sancho [email protected] Research Institute for Pesticides and Water (IUPA), University Jaume I, Castellón, Spain
Color versions of one or more of the figures in this article can be found online at www.tandfonline.com/TFAC
Supplemental data can be accessed here.
© 2018 Taylor & Francis Group, LLC
2 R. GIL SOLSONA ET AL.
compounds may not be the best distinguishing markers, and formic acid (mobile phase modifier) were pur-
as a targeted approach was used. Untargeted metabolo- chased from Sigma-Aldrich.
mics could be a good option to highlight the best com-
pounds to discriminate samples in different scenarios
Sampling
like animal diets (Ruiz-Aracama et al. 2011) or food
traceability, and almond classification by cultivar Spanish almond of different varieties (Bitter almond,
(Beltrán Sanahuja et al. 2011). The main drawbacks Belona, Carrerona, Comuna, Ferranduel, Guara,
were, as previously noted for targeted approaches, the Largueta and Marcona) were purchased from
lack of extra validation steps to confirm that promising Frusema Company (Albocasser, Castellón, Spain).
markers are robust. Almonds from the USA (Bute-padre, California
The untargeted metabolomics approach has and Non-Pareil) were obtained from FruSecs
become very useful for food control (Gil-Solsona Company (Albocasser, Castellón, Spain). In a second
et al. 2016; Sales et al. 2017). In this sense, powerful season, samples were also obtained from Frusema
chromatographic techniques coupled to high-resolu- and FruSecs. In this case, an additional Spanish
tion MS (HRMS) (Emwas 2015) provide the perfect variety, Soleta, was also sampled. A total of 62 sam-
tool to meet this goal, as demonstrated by their ple packages containing 100 g of an individual vari-
increased use in recent years in food authenticity ety were employed.
and control (Castro-Puyana and Herrero 2013;
Rubert et al. 2015).
Sample processing
The main aim of this research was to investigate
the applicability of an untargeted metabolomics Raw samples (100 g) were triturated and homoge-
approach using ultra-high-performance liquid chro- nised; 2.5 g of sample were weighed and mixed with
matography (UHPLC) coupled to HRMS to classify 10 ml of ACN:H2O (80:20) 0.1% HCOOH. After
almond samples according to their country of origin mechanically shaking for 90 min, extracts were soni-
as well as variety. For this purpose, almonds from cated for 15 min and centrifuged for 10 min at
Spain and the USA were employed. Sample extracts 4.500g. The supernatant was diluted fourfold with
were injected and after multivariate analysis the Milli-Q water and stored at −24°C until analysis. A
most relevant compounds were highlighted using a pool of all the extracts was also prepared, named
Variable Importance in Projection (VIP) selection QC, to obtain an average extract of the sample set.
method. Partial least squares – discriminant analysis This pool was used for column stabilisation (by
(PLS-DA) was employed to create both models, injecting 10 QC samples at the beginning of each
which were validated with samples from a second sample batch), and to control possible instrumental
season employed as a system challenge (Riedl et al. signal variation along the sequence.
2015). MS/MS experiments were performed for
highlighted markers which were tentatively eluci-
UHPLC-HRMS
dated with the help of online databases.
A Waters Acquity UPLC system (Waters, Milford,
MA, USA) was coupled to a hybrid quadrupole-TOF
Materials and methods mass spectrometer (Xevo G2 QTOF, Waters,
Manchester, UK), using a Z-spray-ESI interface
Reagents and chemicals operating in both positive and negative ionisation
HPLC-grade water was obtained from a Mili-Q modes. The UHPLC separation was performed
water purification system (Millipore Ltd, Bedford, using a CORTECS® C18 fused-core 2.7 μm particle
MA, USA). HPLC-grade methanol (MeOH), size analytical column 100 × 2.1 mm (Waters) at
HPLC-supergradient ACN, sodium hydroxide 300 μl/min flow rate. The separation was performed
(> 99%) and ammonium acetate (NH4Ac) reagent- using H2O 0.01% HCOOH as weak mobile phase
grade were obtained from Scharlab (Barcelona, (A) and MeOH 0.01% HCOOH as strong mobile
Spain). Leucine-enkephalin (mass-axis calibration) phase (B). The percentage of B was changed from
10% at 0 min, to 90% at 14 min, 90% at 16 min and
FOOD ADDITIVES & CONTAMINANTS: PART A 3
10% at 16.01 min, with a total run time of 18 min. using XCMS R package (https://xcmsonline.scripps.
Injection volume was 10 μl. Nitrogen was used as edu/) (Smith et al. 2006). Centwave feature detection
both the desolvation gas and the nebulising gas. A algorithm was employed for peak picking (peak width
capillary voltage of 0.7 and 1.5 kV for positive and from 5 to 20 s, S/N ratio higher than 10 and mass
negative ion modes, respectively, and cone voltage of tolerance of 15 ppm) to convert chromatograms into a
25 V were used. MS data were acquired over an m/z list of detected features. It was followed by retention
range of 50–1200. TOF-MS resolution was approxi- time alignment, to identify the same ion across different
mately 20,000 at full width half maximum at m/z samples with slightly different retention time (around
556.2771. Collision gas was argon 99.995% (Praxair, 10 seconds of difference). The aligned features were
Valencia, Spain). The desolvation gas flow was set at labelled as MxxxTyyy, where xxx corresponds to the
1000 l/h, and the cone gas was set at 80 l/h. The nominal mass of the compound and yyy to the retention
desolvation gas temperature was set to 600℃, the time in seconds. Mean centring was applied to normal-
source temperature to 130℃ and the column tem- ise each data set, minimising instrumental drifts
perature was set to 40℃. between samples. Finally, log2 transformation was
For MSE experiments, two acquisition functions applied to the area of each detected signal to avoid
with different collision energies were created: the heteroscedasticity, followed by Pareto scaling, which
low-energy (LE) function, with a fixed collision provides to the features their “statistical weight” regard-
energy of 4 eV, and the high-energy (HE) function, ing differences between groups and not depending on
with a collision energy ramp ranging from 15 to their total area.
40 eV, in order to obtain the (de)protonated ion
from LE function and a wide range of fragment
ions from the HE function. Both LE and HE func-
Multivariate analysis
tions used a scan time of 0.3 s with an inter-scan
delay of 0.05 s and were applied in the same injec- Principal component analysis (PCA) and PLS-DA were
tion simultaneously. performed by means of the EZ-Info software (Umetrics,
MS/MS experiments were carried out in the same Sweden). Firstly, PCA was used to ensure the absence of
conditions with different collision energies depend- outliers and the correct grouping of QC samples after
ing on the fragmentation observed for each com- normalisation. PLS-DA was then applied to reduce
pound. Calibrations were conducted from m/z dimensions in the data set. By means of VIP filtering,
50–1200 with a 1:1 mixture of 0.05 M NaOH:5% the minimum required ions to achieve a good classifica-
HCOOH diluted (1:25) with H2O:ACN (20:80), at tion model were obtained.
a flow rate of 10 μl/min. For automated accurate The model was created with 75–80% of the data set,
mass measurement, a leucine-enkephalin solution with samples from both first and second season, ensur-
(2 μg/ml) in ACN:H2O (50:50) at 0.1% HCOOH ing that the selected compounds were independent
was pumped at 20 μl/min through the lock-spray from the harvest year, while the other 20–25% were
needle and measured every 30 s, with a scan time not included in the model creation. With these two
of 0.3 s. The (de)protonated molecule of leucine- groups, the model was validated in two steps. Firstly, a
enkephalin, at m/z 556.2771 in positive mode and cross-validation was applied to control the model good-
m/z 554.2615 in negative mode was used for recali- ness and also an additional validation step was carried
brating the mass axis during the injection and to out with the 20–25% of the samples not included in the
ensure a robust accurate mass along time. model. The statistical model gives two columns, the first
one (Likely Classification) where the model assigns the
sample unequivocally to the group obtained, and the
Data processing
second (Less Likely Classification) where the model can
The untargeted metabolomics data workflow provide no result, only one result and more than one
(Figure S2) starts converting LC-MS raw data from result. Samples with only one result in this column are
proprietary (.raw, Waters Corp.) to generic (.cdf, given as correct by us while the rest are treated as
NetCDF) format using Databridge application (within unknown.
MassLynx v 4.1; Waters Corporation) and processed
4 R. GIL SOLSONA ET AL.
Marker identification 1555 different ions. Data were then analysed with
PCA. At this point, QC samples were employed as
The MS/MS spectra of the most significant metabo-
an external standard to control the correct normal-
lites at 10, 20, 30 and 40 eV were acquired and
isation. The QC sample, as explained in Sample
searched in online databases as METLIN (https://
treatment section, is a pool of all the samples
metlin.scripps.edu/landing_page.php?pgcontent=
employed to perform the model. This sample,
mainPage) or were in-silico tentatively elucidated
which has an average composition, should appear
with MetFrag Software (https://msbi.ipb-halle.de/
after normalisation in the centre of the PCA (non
MetFragBeta/), employing ChemSpider as chemical
supervised method) and grouped, meaning that nor-
structure library When no hits were obtained, we
malisation steps (mean centring, log2 and Pareto
attempted to elucidate them manually.
scaling) has corrected possible instrumental drifts
and differences along the batch. In this case, after
Results and discussion observing this correct QC grouping and the absence
of outliers (Figure S1), PLS-DA models were created
Sample treatment
for country and variety classification.
Almonds contains, regarding the polarity of the com- It is important to observe that both data sets were
pounds, two fractions, a polar fraction (studied in this joined in a single file in order to extract the best ions
paper) and the less-polar fraction (mainly composed for the discrimination step despite the ionisation
by lipids). The polar fraction requires polar solvents mode. As we employed an untargeted strategy, if
(water, methanol, acetonitrile), while in order to one of both ionisation modes better explained the
extract less-polar compounds other kind of solvents differences between groups, these ions would be
should be employed (dichloromethane/methanol preferably selected by VIP filtering. If not, it is
mixtures) as discussed in the literature (Cevallos- necessary to ensure that the selected markers are
Cevallos et al. 2009), or even butanol and 2-propanol the group that better explains the differences inde-
when dealing directly with oils (Gil-Solsona et al. pendently of their ionisation.
2016). For this reason, the non-polar compounds
were extracted with ACN:H2O (80:20) 0.1%
Country classification by PLS-DA
HCOOH, which proved a good extraction solvent
for a wide range of food matrices (Beltrán et al. 2013). Initially, a model to differentiate the origin of the
almonds was created. Samples were divided in three
different groups, Spanish almonds (Belona,
Data treatment
Carrerona, Comuna, Ferranduel, Guara, Largueta,
Both data sets (from positive and negative ionisation Marcona and Soleta), USA almonds (Bute-padre,
modes) were joined in a single file, with a total of California and Non-pareil) and Bitter almonds,
Figure 1. PLS-DA model for country classification. (a) Score plot of the first two components and (b) score plot with test samples.
FOOD ADDITIVES & CONTAMINANTS: PART A 5
which showed a different behaviour (Figure 1). As was tentatively elucidated as 5ʹ-deoxy-5ʹ-
has been previously explained, both sample sets (first (methylthio)adenosine.
and second season) were mixed and 80% of the M293T201 and M448T119 were tentatively iden-
samples were employed to create the model, while tified as glucopyranosyl hydroxycaproic acid and
the remaining 20% were used for model validation. diglucopyranosyl niacin after observing their MS/
The most important ions for the model were MS spectra losses corresponding to hexose groups
initially reduced down to 20 by means of their VIP (C6H10O5). The remaining products’ ions allowed us
value, ensuring that all the samples in the cross- to identify the corresponding aglycone moiety using
validation were correctly classified. However, despite the METLIN database.
a soft ionisation source being employed, more than The METLIN database was used to search for
one ion could be obtained for each marker com- compound M933T337, but as no results were
pound. Therefore, adducts and/or in-source frag- obtained, it was further evaluated using MetFrag
ments corresponding to the same marker were in-silico fragmentation web tool searching for possi-
excluded based on mass accuracy as well as chroma- ble structures in Chemspider. Observing its frag-
tographic profile. Finally, the total number of ions mentation pattern, the consecutive neutral losses of
was heavily reduced to only five compounds/ions. amygdalin and hexose rendering a final product ion
The PLS-DA model was created with these five at m/z 297.0948, this marker was tentatively eluci-
ions with goodness-of-fit (R2Y = 0.848) and good- dated as de-hypoxanthine futalosine conjugated with
ness-of-prediction (Q2Y = 0.771) for the two first amygdalin and one hexose.
components. Then, it was validated in two steps, as
recommended in the literature (Riedl et al. 2015).
Spanish varieties classification by PLS-DA
The first step was the cross-validation of the sample
set employed to perform the model. Here, all 50 In a second step, a classification model was created
samples, which were a mix of both seasons, were in order to differentiate among the Spanish vari-
correctly labelled, ensuring the model robustness. eties included in the first model. Samples were
To finally validate the model, 12 samples not obtained after mixing almonds of different
included initially in the model creation (two bitter Spanish regions, always ensuring the same variety.
almonds, five Spanish almonds and five USA This fact guarantees that the highlighted markers
almonds) were employed to test the model. All the are robust for any Spanish almond, independently
samples were properly classified, showing the suc- from their cultivar.
cessful applicability of the model. For this variety classification model, Spanish
Markers’ identity (see Table 1) was tentatively almonds were divided into seven different groups
performed after MS/MS experiments. The most (Belona, Carrerona, Comuna, Ferranduel, Guara,
important product ions can be observed in Largueta and Marcona), employing 75% (30 sam-
Table S1. Regarding the marker labelled as ples) of all the Spanish almonds to train the model
M318T239, after searching its accurate mass (m/z and the remaining 25% (10 samples) to validate it,
318.2022) in the METLIN database, several hits with at least one sample per variety. In the case of
were retrieved. However, the comparison of the Soleta variety, as only one sample was obtained, it
empirical MS/MS spectrum with available spectra was only employed to test the model, evaluating
allowed us to tentatively elucidate it as a tripeptide potential misclassifications.
(Val-Thr-Val). In a similar way, marker M298T178 VIP filtering was applied to the whole table and
features were checked to include only one ion per
Figure 2. PLS-DA model for variety classification. (a) Score plot of the first two components and (b) score plot with test samples.
Soleta sample is labelled as unknown.
compound, typically the (de)protonated molecule or were found in METLIN and tentatively elucidated
an adduct. Only 20 ions were necessary to build a after comparing their experimental spectra with online
model (see Table 2) with satisfactory goodness-of-fit records. M287T340 and M593T393 were tentatively
(R2Y = 0.866) and goodness-of-prediction elucidated with the MetFrag tool, selecting the highest
(Q2Y = 0.760) using eight components (Figure 2(a)). scoring molecules. For the rest of the compounds,
This classification model was again validated in two M464T246, M548T340, M647T403 and M476T114
steps, a cross-validation, where 29 samples were cor- were finally manually elucidated. An example of man-
rectly labelled, only remaining one as unknown. The ual elucidation is shown in Figure 3. MS/MS experi-
final validation of the model was made with 10 samples ments were carried out for M464T246 marker,
not included in the initial model (1 Belona, 1 annotated as an ammonium adduct, based on accurate
Carrerona, 1 Comuna, 1 Ferranduel, 1 Guara. 2 mass full scan spectra (Figure S3). A product ion at
Largueta, 2 Marcona and 1 Soleta). Soleta was m/z 285.1343 (+1.1 mDa mass error) was observed at
employed to test that samples from varieties not 10 eV corresponding to the loss of NH3 plus C6H10O5
included in the model were not wrongly labelled. group. This signified the presence of at least one
Eight samples were correctly classified, while two sam- hexose unit in the molecule. Then, two consecutive
ples were classified as unknown. One of these was the water losses were observed at m/z 267.1216 (+0.3
Soleta sample, labelled as missing in Figure 2(b), show- mDa) and 249.1114 (+0.3 mDa). Furthermore, pro-
ing the model goodness against misclassifications of duct ion at m/z 163.0602 was assigned to an additional
new almond varieties. hexose unit showing an elemental composition
MS/MS experiments were acquired for these 20 [C6H11O5]+, which is supported by two other conse-
ions and searched in online databases (METLIN or cutive water losses at m/z 145.0501 (−0.4 mDa) and
HMDB). When no results were obtained (12 out of 127.0387 (0.9 mDa). At this point, this compound was
20), elucidation was attempted with MetFrag in- tentatively identified as hexosyl-2-phenylethyl gluco-
silico fragmentation tool, annotating only two addi- pyranoside, as shown in Table 2. In the same way, an
tional compounds and with the remaining 10 ions attempt was made to elucidate the remainder of the
still to be elucidated manually, the most complex compounds. Six were not elucidated unambiguously
and lengthy step in the metabolomics workflow. In as more than one compound fitted to the experimental
this case, we have tentatively elucidated four extra spectra. In any event, the two main product ions are
markers, leaving six markers only as chemical reported in Table S2.
formulas. With these 20 compounds, 95% of the samples
M268T85, M450T190, M348T852, M166T99, were correctly classified regarding their variety, as
M318T239, M503T55, M377T68 and M577T127 the Soleta sample was not misclassified, showing
8 R. GIL SOLSONA ET AL.
Figure 3. Structural elucidation for M464T246. MS/MS spectra at 10 eV (bottom) and 20 eV (top) of the ammonium adduct.
the robustness of the model to assess the selected to discriminate the almond variety, enabling
correct variety and to avoid false positive 95% of the samples to be classified correctly. For the
assignments. future, these promising results will be validated with
a larger sample set, ensuring that the models con-
tinue being robust and accurate in following seasons.
Conclusions
This work has shown that untargeted metabolomics is
a powerful technique to develop classification models Acknowledgments
to differentiate food, based not only on their origin but The authors acknowledge the support from Generalitat
also on variety. The selected extraction procedure pro- Valenciana (Group of Excellence Prometeo II/2017/023).
vides a fast and easy analysis, obtaining robust results. This work has also been developed with financial support
Additionally, the power of the UHPLC-HRMS techni- from Universitat Jaume I (UJI-B2016-10).
que allows the analysis of a wide range of compounds
occurring at low concentrations, highlighting those
providing most differentiation. The analysis of both Disclosure statement
positive and negative ionisation modes gives informa-
No potential conflict of interest was reported by the authors.
tion of a wider range of acidities, which supported by
the appropriate HRMS sensitivity highlights the best
compounds to create classification models.
One of the advantages of QTOF instruments is Funding
the possibility to perform tandem mass spectrometry
This work was supported by the Generalitat Valenciana
experiments with accurate mass information, which
[Group of Excellence Prometeo II/2017/023]; Universitat
strongly helps in the elucidation process. The model Jaume I [UJI-B2016-10].
has allowed us to discriminate the origins of the
almonds with only five compounds, ensuring the
differentiation between Spanish and American ORCID
almonds, also avoiding the inclusion of bitter R. Gil Solsona http://orcid.org/0000-0003-0937-9072
almonds. Furthermore, 20 markers have been J. V. Sancho http://orcid.org/0000-0002-6873-4778
FOOD ADDITIVES & CONTAMINANTS: PART A 9