Acta Tropica
journal homepage: www.elsevier.com/locate/actatropica
Keywords: Unequivocal identification of fly specimens is an essential requirement in forensic entomology. Herein, a simple,
NIRS non-destructive and rapid method based on two vibrational spectroscopy techniques [Near-Infrared
Forensic entomology Spectroscopy (NIRS) and attenuated total reflection Fourier-transform infrared (ATR-FTIR) spectroscopy] cou-
ATR-FTIR pled with variable selection techniques such as genetic algorithm-linear discriminant analysis (GA-LDA) and
successive projection algorithm-linear discriminant analysis (SPA-LDA) were applied for identifying and dis-
criminating six species of flesh flies (Diptera: Sarcophagidae) native to Neotropical regions. This novel approach
is based on the unique spectral “fingerprints” of their biochemical composition. One hundred sixty (160) NIRS
and FT-IR specimens (120 male, 40 female) were acquired; different pre-processing methods such as baseline
correction, derivative and Savitzky-Golay smoothing were also performed. In addition, the multivariate classi-
fication accuracy results were tested based on sensitivity, specificity, positive (or precision) and negative pre-
dictive values, Youden index, positive and negative likelihood ratios. Principal components analysis (PCA) was
employed for male vs. female category using NIRS, strongly showing the separation between the classes with
only three principal components and 99% explained variance. Differentiation between the genera
Oxysarcodexia, Peckia and Ravinia was efficiently confirmed by both techniques. In comparison with other
biological methods, this approach represents an effective choice for fast and non-destructive identification in
forensic entomology.
Corresponding author at: Institute of Chemistry, Biological Chemistry and Chemometrics, UFRN, Natal, RN, 59.072-970, Brazil.
E-mail address: [email protected] (K.M.G. Lima).
These authors contributed equally to this work.
Received 22 February 2018; Received in revised form 14 April 2018; Accepted 22 April 2018
Available online 24 April 2018
0001-706X/ © 2018 Elsevier B.V. All rights reserved.
T.M. Barbosa et al. Acta Tropica 185 (2018) 1–12
and destructive procedure and the use of bulky instrumentation that Table 1a
impairs in-field monitoring. A rapid, inexpensive and non-destructive Number of training, validation and prediction specimens in each category for
method for species identification with the potential for high throughput three classes classification.
would thus be desirable as an alternative to morphology-based Category Set training Set Validation Set Prediction
methods. However, most techniques require sophisticated equipment
and expensive reagents, and frequently demand a large number of in- PC 14 3 3
PS 14 3 3
dividuals − a major hindrance given the fact that sarcophagids are
RF 14 3 3
usually found in high species richness and low abundance in nature
(Sousa et al., 2014).
In this scenario, vibrational spectroscopies such as near infra-red Table 1b
(NIR) and attenuated total reflection Fourier-transform infrared (FTIR) Number of training, validation and prediction specimens in each category for
are label-free, rapid, non-destructive techniques that are cost effective all classes classification.
and require little sample preparation. They can be used to determine
Category Set training Set Validation Set Prediction
the insect metabolic identity (lipids, proteins, cellular processes) and to
differentiate between species based on their absorbance characteristics OTI 14 3 3
because the cuticle of each species may have a unique chemical com- OT 14 3 3
PC 14 3 3
position (Lima et al., 2014). Thus, it can be as specific as barcoding,
PS 14 3 3
without the need for time-consuming and expensive DNA extraction RF 14 3 3
and analysis techniques. Because absorption is determined by the in- TA 14 3 3
ternal and external biochemical composition of the organism, a species
will have a “fingerprint” based on its particular absorption spectrum
(Rodríguez-Fernández et al., 2011). sample preparation, spectroscopic measurement, data preprocessing,
NIR applications in entomology have varied so far. It has been feature selection and analytical validation were developed. Specifically,
useful to determine the age and species of Anopheles gambiae sensu lato we aimed to: i) test the validity of NIR and ATR-FTIR for identifying
complex (Sikulu et al., 2010), to detect flunitrazepan in larvae, puparia flesh fly species native to Neotropical regions; ii) compare the effec-
and adults of necrophagous blow flies (Chrysomya megacephala, Chry- tiveness of the two methods for taxonomical purposes; and iii) infer on
somya albiceps and Cochliomyia macellaria) (Oliveira et al., 2014; Baia the applicability of NIR as an accessible tool for species identification in
et al., 2016), to discriminate live individuals of two Drosophila species comparison to morphological methods. To our knowledge, this is the
(D. subobscura and D. obscura) (Fischnaller et al., 2012), to identify first application of PCA-LDA, SPA-LDA and GA-LDA to differentiate
stored grain beetles (Jia et al., 2007), to differentiate species of Lepi- insect samples based on spectral data.
doptera (Dowell et al., 2005), and to determine gender in fly pupae
(Dowell et al., 1999). 2. Material and methods
However, this technique generates several hundreds or even thou-
sands of variables that exist in the near infra-red/infra-red spectra. In 2.1. Insects
addition, redundancy and collinearity are widespread phenomena
among these variables, since they contain interference coming from Specimens used in this study were collected from different locations
background, noise and overlapping bands, challenging a high-quality in Pernambuco State, Northeastern Brazil, between October/2012 and
calibration model for unknown sample prediction. Therefore, the use of August/2013. Collecting was done using traps baited with decomposing
appropriate chemometrics tools for multivariate calibration and clas- chicken liver and fish, previously exposed for 48 h at 24 °C. The field-
sification is largely responsible for advancing spectroscopic techniques. caught flies were killed with ethyl acetate for morphology-based species
These include partial least squares (PLS) (Dupuy et al., 2010), principal identification. Ten male specimens of each species were identified −
component regression (PCR) (Xie and Kalivas, 1997) artificial neural Oxysarcodexia timida (OTI), Oxysarcodexia thornax (OT), Peckia chry-
networks (ANN) (Makino et al., 2010) and least squares-vector support sostoma (PC), Peckia lambens (PS), Ravinia belforti (RF) and Tricharaea
machine (LS-SVM) (Shao et al., 2012). Further, there is still the prin- occidua (TA) and 40 unidentified female individuals were selected for
cipal component analysis (PCA) for initial data reduction (Marques the analyses. These species were chosen due to their forensic and/or
et al., 2013), hierarchical cluster analysis (HCA) to analyze groups in a medical relevance and ubiquitous presence in several environments in
set of data on the basis of spectral similarities (Martin et al., 2011) and the Neotropical Region (Vasconcelos et al., 2015). Prior to the analysis,
linear discriminant analysis (LDA) to classify unknown samples into specimens were removed from the alcohol and left to dry on absorbent
predetermined groups (Cheung et al., 2011). Finally, a well-succeeded tissue paper for at least 10 min, to allow for alcohol evaporation.
approach to overcome problems with redundancy or collinearity is the
successive projections algorithm (SPA) (Pontes et al., 2005) in con- 2.2. NIR spectroscopy
junction with linear discriminant analysis (LDA) and genetic algorithm
(GA) (Tapp et al., 2003). NIR spectral [n = 60, 20 Peckia chrysostoma, 20 Peckia lambens, 20
The choice and development of the multivariate classification ap- Ravinia belforti] measurements were performed using an Antaris MX FT-
proaches ensure reliable insect identification using NIR/IR spectro- NIR spectrophotometer (Thermo Fisher Scientific Inc., USA) equipped
scopy. For instance, multivariate classification quality features such as with a transflectance fiber optic probe. The NIR spectra were obtained
sensitivity, specificity, positive and negative predictive values, Youden over a range of 10,000–4000 cm−1, or 1000–2500 nm, and were re-
index, and positive and negative likelihood ratios should be calculated corded with a spectral resolution of 32 cm−1, with 32 co-added scans.
to ensure the validity of the results in accordance with International The time measurement was 26 s (32 scans) per spectrum. The trans-
guidelines (Costa et al., 2016). flectance probe was washed with ethanol (70% v/v) and dried using
Herein, we have evaluated a simple, non-destructive and rapid tissue paper after each sample. Cleanliness of the transflectance probe
method based on two vibrational spectroscopy techniques (NIR and was verified by collecting an absorbance spectrum of the probe using
ATR-FTIR) coupled with variable selection techniques such as genetic the most recently collected background as a reference. Spectral mea-
algorithm-linear discriminant analysis (GA-LDA) and successive pro- surements were done in an acclimatized room under controlled tem-
jection algorithm-linear discriminant analysis (SPA-LDA) for the iden- perature of 22 °C and 60% relative air humidity. Samples were allowed
tification and discrimination of Sarcophagidae species. In our study, to equilibrate to this temperature before the analysis.
T.M. Barbosa et al. Acta Tropica 185 (2018) 1–12
Fig. 1. (a) Average spectra acquired from three categories. The spectra from Peckia chrysostoma (PC) are shown in black; those with Peckia lambens (PS) are shown
in green; those with Ravinia belforti (RF) are shown in pink. (b) Pre-processing spectra from three categories. The spectra from PC are shown in black; those with PS
are shown in green; those with RF are shown in pink. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of
this article.)
2.3. ATR-FTIR spectroscopy the projection. The variables created through LDA (factors) are linear
combinations of the wavenumber-absorbance intensity values (Martin
IR spectra [n = 160, 60 male (2 spectra), 40 female] were collected et al., 2007). Thus, the use of LDA for identification or classification of
from each individual insect using the Bruker Lumus FTIR spectrometer spectral data generally requires appropriate variable selection proce-
with motorized ATR crystal (Bruker Optics Ltd, Coventry, U.K.). Prior to dures (Silva et al., 2013). In the present study, the PCA, SPA and GA
analyzing each specimen, the diamond crystal within the spectrometer were adapted for this function. In the PCA-LDA, SPA-LDA and GA-LDA
was washed and a background spectrum was obtained to account for models, the validation set was used to guide the variable selection, a
atmospheric composition. strategy to avoid overfitting. The optimum number of variables for SPA-
LDA and GA-LDA was determined from the minimum of the cost
function G calculated for a given validation data set as:
2.4. Chemometrics methods: PCA-LDA, SPA-LDA, GA-LDA and PCA
LDA is a supervised linear transformation that projects the variables 1
∑ gn
(wavenumbers, for example) into a variable-reduced space, which is n=1
optimal for discrimination between treatment classes. An LDA seeks for (1)
a projection matrix such that Fisher criterion (i.e. the ratio of the be-
where gn is defined as
tween-variance scatter to the within-class variance) is maximized after
T.M. Barbosa et al. Acta Tropica 185 (2018) 1–12
Fig. 2. (a) Variables selected by SPA-LDA in PC, PS and RF classification. (b) Variables selected by GA-LDA in PC, PS and RF classification. (c) Variables selected by
SPA-LDA in OTI, OT, PC, PS, RF and TA classification. (d) Variables selected by SPA-LDA in Oxysarcodexia timida (OTI), Oxysarcodexia thornax (OT), Peckia
chrysostoma (PC), Peckia lambens (PS), Ravinia belforti (RF) and Tricharaea occidua (TA) classification.
The import and pre-treatment data, as well as the chemometric For the NIR method, the raw and pre-treated spectra can be visua-
model constructions were implemented into MATLAB R2014a software lized (Fig. 1a, b). When considering all classes, the discriminant func-
(Mathworks Inc, Natick, MA, USA). NIR raw spectra were pre-processed tions did not present a very clear segregation for the PCA-LDA, SPA-
T.M. Barbosa et al. Acta Tropica 185 (2018) 1–12
Fig. 2. (continued)
LDA and GA-LDA, respectively. When the classes were reduced to the LDA (Fig. 5b and c). Regardless of the algorithm used, it was observed
most discriminating, the number of informational variables decreased that the results were similar, with a clear separation of the species
from 78 to 39 (SPA-LDA), and from 48 to 24 in the GA-LDA method, Tricharaea occidua (TA) and Ravinia belforti (RF) being clearly set in
where both decreases were statistically significant (χ2Yates = 13; specific clusters in relation to the other clusters, which allowed for
P = .0004, and χ2Yates = 8, P = .0067, respectively) and can be visua- distinct genera: a) a group of the genus Peckia [P. chrysostoma (PC) near
lized in Fig. 2a and b. The discriminant function showed clearer seg- P. lambens (PS)] and b) a group of the genus Oxysarcodexia [O. timida
regations for the species Peckia chrysostoma (PC), Peckia lambens (PS) (OTI) approach to O. thornax (OT)].
and Ravinia belforti (RF) (Fig. 3a–c).
3.3. Identification of females from male profiles
3.2. Species segregation (Males) using FT-IR
The set of raw spectra and cut spectra in the biological fingerprint
The raw and preprocessed spectra for the FT-IR method can be vi- region for male specimens of the [O. thornax (OT) and P. chrysostoma
sualized in Fig. 4a and b, respectively. The biological fingerprint region (PC)] and unidentified females are available in Fig. 6a and b. The best
was used for this method for elaborating the classification models in separation between males and females using the PCA technique was
applying the same PCA-LDA, SPA-LDA and GA-LDA algorithms. Despite achieved using three main components (PCs), which together re-
showing a tendency to segregate between classes, PCA-LDA formed presented more than 99.5% of the total data variance. That is, most of
only one group (Fig. 5a). In the case of SPA-LDA, the number of vari- the significant values were reached to discriminate both sex and species
ables selected was similar to that of the NIR method (78 variables) among Sarcophagidae adults.
(Fig. 2c), whereas the number was lower in GA-LDA (31 variables) Based on the analysis of the components, it is observed that PC1 and
(Fig. 2d). In addition, FT-IR was more sensitive in segregating pre-de- PC2 were efficient in allowing segregation between sex and species, so
fined classes, with clearer group formation in both SPA-LDA and GA- that clusters above the PC2 axis mean different species, while the right
T.M. Barbosa et al. Acta Tropica 185 (2018) 1–12
Fig. 3. (a) DF1x DF2 discriminant function values calculated by using principal component analysis (PCA) – linear discriminant analysis (LDA) results from three
categories. (b) DF1x DF2 discriminant function values calculated by using the variables selected by successive projection algorithm (SPA) – linear discriminant
analysis (LDA) results from three categories. (c) DF1x DF2 discriminant function values calculated by using the variables selected by genetic algorithm (GA) – linear
discriminant analysis (LDA) results from three categories.
T.M. Barbosa et al. Acta Tropica 185 (2018) 1–12
Fig. 4. (a) Average spectra acquired from all categories. The spectra from Oxysarcodexia timida (OTI) are shown in blue; those with Oxysarcodexia thornax (OT) are
shown in red; those with Peckia chrysostoma (PC) are shown in cyan; those with Peckia lambens (PS) are shown in green; those with Ravinia belforti (RF) are shown
in purple; those with Tricharaea occidua (TA) are shown in black. (b) Pre-processing spectra from all categories. The spectra from OTI are shown in blue; those with
OT are shown in red; those with PC are shown in cyan; those with PS are shown in green; those with RF are shown in purple; those with TA are shown in black. (For
interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
PC1 clusters correspond to the male two species, P. chrysostoma and O. and 2b, it is possible to see that these rates varied according to the
thornax (Fig. 7a). The PC2 components confirm the segregation be- model used; meaning PCA-LDA, SPA-LDA and GA-LDA. Furthermore, as
tween clusters corresponding to the species, also pointing out the ex- shown in Table 2b, the specificity for all categories suggests that PCA-
istence of unidentified specimens of the genus Peckia which were con- LDA as well as GA-LDA presented improved accuracy in comparison
figured as outliers. PC3 clearly demonstrates the separation between with SPA-LDA.
male and female specimens (Fig. 7b). That is, three components are PCA-LDA still achieved sensitivity and specificity scores of 100% for
sufficient to identify female specimens from the comparison of the all the species categories, thus showing that the species can be rela-
profiles obtained for males of P. chrysostoma and O. thornax. tively well classified by these methods. However, in general the other
tested methods also showed high sensitivity and specificity (Table 2b).
These results still show that NIR and ATR-FTIR microspectroscopy in
3.4. Performance of methods conjunction with powerful chemometric approaches has the potential
to identify and differentiate species of necrophagous flies captured in a
Classification rates were determined by using the best models. corpse.
Tables 2a and 2b present the performance features results for the op-
timized models (PCA-LDA, SPA-LDA and GA-LDA) of each category.
According to the sensitivity and specificity results shown in Tables 2a
T.M. Barbosa et al. Acta Tropica 185 (2018) 1–12
Fig. 5. (a) DF1x DF2 discriminant function values calculated by using principal component analysis (PCA) – linear discriminant analysis (LDA) results from all
categories. (b) DF1x DF2 discriminant function values calculated by using the variables selected by successive projection algorithm (SPA) – linear discriminant
analysis (LDA) results from all categories. (c) DF1x DF2 discriminant function values calculated by using the variables selected by genetic algorithm (GA) – linear
discriminant analysis (LDA) results from all categories.
T.M. Barbosa et al. Acta Tropica 185 (2018) 1–12
Fig. 6. (a) Average spectra acquired from three categories. The spectra from Oxysarcodexia thornax (OT) are shown in red; those with Peckia chrysostoma (PC) are
shown in green; those with Female are shown in gray. (b) Pre-processing spectra from three categories. The spectra from OT are shown in red; those with PC are
shown in green; those with Female are shown in gray. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of
this article.)
T.M. Barbosa et al. Acta Tropica 185 (2018) 1–12
Fig. 7. (a) Biplot graph of PC1xPC2 scores calculated by using principal component analysis (PCA); (b) Biplot graph of PC2xPC3 scores calculated by using principal
component analysis.
species and the varieties of this taxon (Kim et al., 2004) because the scarcity of bionomical and behavioural studies limits the potential of
technique is based on vibrations correlated to functional groups present Sarcophagidae species in estimating the PMI (Vairo et al., 2014), ac-
in biomolecules present on the exoskeleton, such as carbohydrates, curate identification of the target species of this study is of utmost
proteins, cuticular lipids, DNA and RNA, whereas in the case of FT-NIR importance. From a medical standpoint, identification of larvae and
overtones of these functional groups may overlap, making it difficult to adults is crucial for developing the monitoring and control of flesh fly
visualize possible spectral differences. On the other hand, the apparatus species that act as vectors of pathogens and causal myiasis agents
needed for the use of NIR is more easily manipulated in the field when (Greenberg, 1971; Vairo et al., 2014).
compared to the equipment used in the FT-IR analysis. On the other In this study, it was possible to identify females at a specific level
hand, it can be argued that both FT-NIR and FT-IR can be used to based on the spectral signatures of adult males of the same species,
discriminate closely-related species (Maree and Viljoen, 2011). This being able to explain 99.5% of the variance between classes, and re-
was demonstrated in a study that applied NIR for the identification of quiring only three PCs. This contribution expands the usefulness of
species and subspecies of Zootermopsis (Isoptera: Termopsidae), with females in practical cases of criminal investigations, since ephemeral
over 95% and 80% precision, respectively (Aldrich et al., 2007). substrates (carcasses, cadavers) are mostly visited by females for
In this study, NIR and ATR-FTIR were successful in discriminating feeding and immature deposition; the female-driven sex ratio fre-
the genera and species of sarcophagids, such as Peckia chrysostoma and quently exceeds 90% in field studies (Barbosa et al., 2017). Tax-
Ravinia belforti, previously registered on cadavers (Oliveira and onomical keys currently available prioritize morphological characters
Vasconcelos, 2010; Vasconcelos et al., 2014). Given the fact that the of the male (e.g. shape of the aedeagus), so that tools based on female
T.M. Barbosa et al. Acta Tropica 185 (2018) 1–12
Table 2a and FT-IR were calculated and can be visualized in Tables 2a and 2b,
Values of quality performance features from three classification methods (PCA- respectively. FT-NIR spectroscopy showed sensitivity and specificity
LDA, SPA-LDA and GA-LDA) by FTIR microspectroscopy for three category. close to the values predicted by the morphological methodology,
PC PS RF around 60% accuracy for the PS class and superior for the other classes.
On the other hand, FT-IR spectroscopy presented values of sensitivity
PCA-LDA sens 66.67 0 66.67 and specificity superior to the morphological methodology, obtaining
spec 66.67 0 66.67
100% correctness. In addition, this technique has other positive points
ppv 66.67 0 66.67
npv 66.67 0 66.67 because it does not require sample preparation, it does not generate
you 33.33 −100 33.33 residue, and is fast, non-destructive and low cost (Rodríguez-Fernández
LR(+) 2 0 2 et al., 2011).
LR(−) 0.5 0 0.5
The results show that the high efficiency of FT-NIR, revealed in the
SPA-LDA sens 66.67 66.67 100 high potential for identifying females, can be extrapolated to dis-
spec 66.67 66.67 100 criminate the young stages (larvae and pupae) of the species of re-
ppv 66.67 66.67 100
npv 66.67 66.67 100
cognized medical and forensic potential, as already seen for the
you 33.33 33.33 100 Calliphoridae family (Pickering et al., 2015). From the forensic science
LR(+) 2 2 0 perspective, our study shows that the taxonomy tool resulting from
LR(−) 0.5 0.5 0 infrared spectroscopy (FT-NIR and FT-IR) acts as a starting point for
GA-LDA sens 100 66.67 66.67 developing a spectral database library covering different necrophagous
spec 100 66.67 33.33 species, allowing for practical use by forensic investigators. The busy
ppv 100 66.67 50
routine of a criminal expert in cities exposed to high homicide rates, for
npv 100 66.67 50
you 100 33.33 0
example, in Northeast Brazil, makes it impossible to collect, create and
LR(+) 0 2 1 identify necrophagous specimens under controlled conditions, which is
LR(−) 0 0.5 1 a necessary procedure to obtain reliable entomological evidence. Co-
operation with academic institutions would amplify the realism of in-
vestigations by elucidating the identity of the entomological agent in-
Table 2b volved. However, it is extremely important to substantiate the
Values of quality performance features from three classification methods (PCA- identification through reliable methods in order to guarantee the
LDA, SPA-LDA and GA-LDA) by FTIR microspectroscopy for all category.
quality of the spectral bank.
PCA-LDA sens 100 100 100 100 100 100
spec 100 100 100 100 100 100
ppv 100 100 100 100 100 100 We thank Coordenaçao de Aperfeiçoamento de Pessoal de Nível
npv 100 100 100 100 100 100 Superior (CAPES) and Conselho Nacional de Desenvolvimento
you 100 100 100 100 100 100 Científico e Tecnológico (CNPq) for financial support, the team of the
LR(+) 0 0 0 0 0 0
Insects of Forensic Importance Research Group for their invaluable help
LR(−) 0 0 0 0 0 0
with insect collection, and the Instituto Chico Mendes de Conservação
SPA-LDA sens 66.66 33.33 66.66 33.33 100 100
da Biodiversidade (ICMBio) for sampling authorization. K.M.G. Lima
spec 66.66 33.33 66.66 33.33 100 100
ppv 66.66 33.33 66.66 33.33 100 100
acknowledges the CNPq/Capes project (Grant 070/2012 and 442087/
npv 66.66 33.33 66.66 33.33 100 100 2014-4) for financial support.
you 33.33 −33.33 33.33 −33.33 100 100
