Fpsyt 11 00440
Fpsyt 11 00440
Fpsyt 11 00440
Edited by:
Albert Yang, Resting-state functional magnetic resonance imaging (rs-fMRI) data are 4-dimensional
National Yang-Ming University, volumes (3-space + 1-time) that have been posited to reflect the underlying mechanisms
Taiwan
of information exchange between brain regions, thus making it an attractive modality to
Reviewed by:
develop diagnostic biomarkers of brain dysfunction. The enormous success of deep
Kaiming Li,
Sichuan University, China learning in computer vision has sparked recent interest in applying deep learning in
Raymond Salvador, neuroimaging. But the dimensionality of rs-fMRI data is too high (~20 M), making it difficult
FIDMAG Hermanas Hospitalarias
Research Foundation, Spain to meaningfully process the data in its raw form for deep learning experiments. It is
*Correspondence: currently not clear how the data should be engineered to optimally extract the time
Rajat Mani Thomas information, and whether combining different representations of time could provide better
[email protected]
results. In this paper, we explored various transformations that retain the full spatial
†
These authors share first authorship
resolution by summarizing the temporal dimension of the rs-fMRI data, therefore making it
Specialty section:
possible to train a full three-dimensional convolutional neural network (3D-CNN) even on a
This article was submitted to moderately sized [~2,000 from Autism Brain Imaging Data Exchange (ABIDE)-I and II] data
Neuroimaging and Stimulation,
set. These transformations summarize the activity in each voxel of the rs-fMRI or that of the
a section of the journal
Frontiers in Psychiatry voxel and its neighbors to a single number. For each brain volume, we calculated regional
Received: 10 February 2020 homogeneity, the amplitude of low-frequency fluctuations, the fractional amplitude of low-
Accepted: 28 April 2020 frequency fluctuations, degree centrality, eigenvector centrality, local functional
Published: 15 May 2020
connectivity density, entropy, voxel-mirrored homotopic connectivity, and auto-
Citation:
Thomas RM, Gallo S, Cerliani L,
correlation lag. We trained the 3D-CNN on a publically available autism dataset to
Zhutovsky P, El-Gazzar A and classify the rs-fMRI images as being from individuals with autism spectrum disorder
van Wingen G (2020) Classifying
(ASD) or from healthy controls (CON) at an individual level. We attained results competitive
Autism Spectrum Disorder Using the
Temporal Statistics of Resting-State on this task for a combined ABIDE-I and II datasets of ~66%. When all summary measures
Functional MRI Data With 3D were combined the result was still only as good as that of the best single measure which
Convolutional Neural Networks.
Front. Psychiatry 11:440.
was regional homogeneity (ReHo). In addition, we also applied the support vector
doi: 10.3389/fpsyt.2020.00440 machine (SVM) algorithm on the same dataset and achieved comparable results,
suggesting that 3D-CNNs could not learn additional information from these temporal
transformations that were more useful to differentiate ASD from CON.
Keywords: neuroimaging, 3D convolutional neural network, classification, autism, Autism Brain Imaging Data
Exchange, deep learning
perform better than a linear or other conventional machine- normalized onto the standard Montreal Neurological Institute
learning models in neuroimaging (8, 9), and in order to form a (MNI) space (4 mm) with the non-linear registration algorithm
robust baseline, we also performed the classification using a from Ants (15). All the above steps were configured using C-
linear-SVM. To gain insight into the information the models are PAC's singularity image4.
using for the classification task, we performed an occlusion
experiment. This type of analysis describes which regions of Quality Check and Subject Selection
the brain contribute maximally to the output of the most After preprocessing, we selected subjects from the ABIDE I+II
successful neural networks. The code used of all the data-sets following the list provided in (16); the authors
experiments performed is available on GitHub1. performed an automatic quality control (QC) by selecting
those subjects that retained at least 100 frames or 4 min of
fMRI scans after motion scrubbing based on framewise
displacement. Then subjects were visually inspected by the
MATERIALS AND METHODS authors (16). Only subjects that passed the entire QC process
Autism Brain Imaging Data are included in this study. The procedure yielded a total of 1,162
subjects, 620 of which were classified as ASD (773 from ABIDE-I
Exchange Dataset
and 389 from ABIDE-II). Please refer to the original article for an
We used the ABIDE dataset for all of our experiments. The
extensive description of the procedure. See Table 1 for more
ABIDE I+II datasets is a collection of structural (T1w) and
details on the sample composition.
functional (rs-fMRI) brain images aggregated across 29
institutions (10), available for download2. It includes 1,028
participants with a diagnosis of autism, Asperger or pervasive
Resting-State Functional MRI Summary
developmental disorder-not otherwise specified (called ASD
Measures
The preprocessed rs-fMRI images were transformed into nine
from now on), and 1,141 typically developing participants
summary measures that reduced the temporal dimension of the
(CON). Virtually all the ASD participants were high
data to a single number per voxel by highlighting different statistical
functioning (99.95% with IQ > 70), most of the included
features of the time series. The summary measures chosen were:
participants were adolescents (median age 13 years, range
between 5 and 64 years of age), 1/3 of whom were diagnosed I. Regional homogeneity (ReHo), a voxel-based measure of
as ASD, and 20% of the total participants were female, which brain activity which evaluates the similarity or
represents an important addition with respect to most previous synchronization between the time series of a given voxel
autism studies which focused on males exclusively. The rs-fMRI and its nearest neighbors (17).
image acquisition time ranges from 2 to 10 min, with 85% of the II. Amplitude of low frequency fluctuations (ALFF),
datasets meeting the suggested duration (~ 4–5 min) for defined as the total power within the frequency range
obtaining robust rsfMRI estimates (11). We chose to cut off between 0.01 and 0.1 Hz, and thus indexes the strength or
the minimum scan duration to 100-time points, which led us to intensity of low frequency oscillations (18, 19).
include 96% of the whole ABIDE I+II dataset (N=2,085, N(ASD) III. Fractional amplitude of low frequency fluctuations
=993), the vast majority of which (95%) with a minimum (fALFF), defined as the power within the low-frequency
acquisition time of 4 min. range (0.01–0.1 Hz) divided by the total power in the
entire detectable frequency range, representing the relative
Resting-State Functional MRI contribution of specific low frequency oscillations to the
Preprocessing whole frequency range (20).
The preprocessing was done using the Configurable Pipeline for IV. Degree centrality (DC, weighted), is a measure of local
the Analysis of Connectomes (C-PAC, (12). We followed a network connectivity and identifies the most connected
preprocessing strategy adopted by the Preprocessed nodes by counting the number of direct connections
Connectome Project initiative3. This will allow others to (edges) to all other nodes. As such, a node with high DC
replicate and extend the findings in this paper. Our will have direct connections to many other nodes in the
preprocessing pipeline, consisted of i) motion correction, network. Degree centrality analysis tends to emphasize higher
ii) nuisance regression which included head motion modeled order cortical association areas while showing reduced
as 24-regressors (13), scanner drift modeled using a quadratic sensitivity for paralimbic and subcortical regions (20, 21).
and linear term, and physiological noise modeled using the five V. Eigenvector centrality (EC, weighted) is a measure of
principal components from a decomposition of white matter and global network connectivity. The EC of a given node
cerebrospinal fluid voxel time series (CompCor) (14), reflects the number of direct connections it has with
iii) coregistration of the resulting rs-fMRI image to the other nodes that have high centrality (21).
subject's anatomical image using FMRIB Software library (FSL) VI. Local functional connectivity density (LFCD, weighted),
Boundary-Based (BB) register. Finally, iv) the images were a quantification of the number of local and global
1
functional connections for each voxel in the brain (22, 23)
https://github.com/galloselene/TempStats_3D-CNN.git
2
http://fcon_1000.projects.nitrc.org/indi/abide/
3 4
http://preprocessed-connectomes-project.org/abide/ https://fcp-indi.github.io/docs/user/quick.html
Site Group Group # (females #) Mean age s.d. age Selected for
leave-site-out CV
The table illustrates the sample composition: the site of data collection in alphabetical order (SITE_ID), the number of subjects categorized as ASD or CON (group #) and the number of
female individuals in each group (females #), mean, and standard deviation of age per group (in years, respectively mean, age, and s.d. age). The last column describes if data contained in
the specific site was used as test-data in the leave-site-out procedure.
VII. Entropy, a measure of organization and predictability of a done by computing the connectivity between each voxel in one
system (17). hemisphere and its mirrored counterpart in the other (20, 21).
VIII. Voxel-mirrored homotopic connectivity (VMHC), a IX. Auto-correlation lag, the correlation between its past and
quantification of functional homotopy by providing a voxel- present states; thus, a high correlation indicates that the
wise measure of connectivity between hemispheres. This is series state does not change over time (24).
FIGURE 1 | Schematic representation of our approach. First, the resting-state functional MRI (fMRI) data are summarized on the time domain while keeping the
spatial resolution intact. The resulting 3D volume is then input to the 3D-convolutional neural network (CNN) models for the classification task. The 3D-CNN model
used across experiments is schematically illustrated. Abbreviation: “Conv” stands for convolutional layer, and the n associated to each layer indicates the number of
kernels. “Linear” stands for linear fully connected layer.
The mean and standard deviation of the summary measures measures, resulting in nine independent models. In the MM-
were calculated within voxels across brain volumes belonging to ensemble approach, the classification problem is first
the training sample, and these values were used to normalize the independently solved for each summary measure as for the
entire dataset. This procedure, referred to as feature scaling, SM-model, but the final prediction is computed as the majority
speeds up the convergence of the model during training. vote of the individual binary class predictions. In the MM-model
Moreover, in the particular context of the ABIDE dataset, in approach, the input of the 3D-CNN is formed by concatenating
which imaging data are known to vary in quality and have been the summary measures. Each channel is a summary measure,
collected using different scanner hardware and sequence, the therefore the input is now represented as nine-channel 3D
procedure might help to mitigate the heterogeneity of the data. volumes. The other architectural parameters are kept the same.
We used the linear-SVM as a baseline to compare our 3D-
CNN models against. For this purpose, the volumes of each
Network Architecture participant were first flattened into a 1D array and voxels that
In this work, we utilized a 3D CNN architecture to classify ASD
were part of a brain mask were used as features for the SVM. The
from CON subjects. The architecture is inspired by (16). We
masks were created independently for each summary measure
reduce the number of layers and filters per layer to reduce the
and contained only voxels that appeared as non-zero in at least
number of parameters to around 257 k, therefore reducing
90% of the subjects. In the SM-model approach, we trained the
computational complexity and cost. Therefore, starting from
linear-SVM on each of the nine summary measures. In the MM-
Khosla et al.'s model, we performed a non-systematic
ensemble approach, the outcome of the model was defined as the
parameter and hyper-parameter search, and carried out the full
class with the majority SM-models votes. To perform the MM-
experiments on the best configuration. Before honing in on the
model, for each subject, the nine volumes representing each of
above architecture we also experimented with 3D versions of the
the nine summary measures were flattened and concatenated in a
popular computer-vision architectures like ResNet-50 (25),
single 1D array. The linear-SVM was then trained on the 1D
Visual Geometry Group (VGG)-net (26), and it's variants
arrays. In all the linear-SVM experiments we used the default
without much success.
parameter options in the machine learning python package
Our model (Figure 1) consisted in a first layer of average
Scikit-learn5 (27, 28).
pooling of size 2 and stride 2, which functioned as a sort of down-
sampling function. Two convolutional layers with a exponential Cross-Validation Procedure
linear unit (ELU) activation followed. The first convolutional Cross-validation is a resampling procedure used to evaluate
layer had 64 filters of size 3 and the second convolutional layer machine learning models on independent data for a limited
had 16 filters of size 3. The convolutional layers were followed by data sample. For each experiment, we implemented two cross-
a max-pooling layer of kernel dimension of 2. The output was validation schemes: i) five-fold cross-validation (CV) and ii) a
flattened and fed to a first fully connected layer with 16 nodes and leave-site-out CV procedure.
again ELU activation. The last layer was a fully connected layer
with one node for final labels classification.
In the SM-model approach, we used the architecture
described above on each of the nine rs-fMRI summary 5
https://scikit-learn.org/
Before implementing the five-fold cross-validation schemes, the summary measures as input. In the next sections, we
the data were split into training/validation-data, consisting of the illustrate the results obtained by the 3D-CNN models and by
90% of the total dataset, and test-data, consisting of the remaining the linear-SVM.
10%. The five-fold cross-validation involved randomly dividing
the set of observations into groups, or folds, of approximately 3D-Convolutional Neural Network Results
equal size. The first fold is treated as validation set, and the model For our 3D-CNN architecture we used an architecture very
is fit on the remaining four folds, called the training set (29). In our similar to that of Khosla et al. (16). Other architectures like
application, the validation set was used to select the best trained ResNet and VGG-net did not perform very well at all and
model for each fold. To do so, during training we evaluated the therefore we only present results from the architecture
performance on the validation set after each epoch. The model described in the Network Architecture. The models were
with the best accuracy on the validation set between epochs was trained with a mini-batch of 32, for a maximum of 50 epochs.
used for testing. The test-data are therefore invariant across CV The loss of the validation set had converged by then. The neural
folds. Although this is not the most common application of K-fold network weights were optimized by the binary-cross entropy loss
CV, this procedure is often used in competitions [e.g., predictive and Stochastic gradient descendent (SGD) with learning rate of
analytic challenge—PAC, 2019, see (30)], it guarantees no 0.001 and momentum of 0.9. The same model and parameters
contamination between training/validation-data and test-data, were used across all experiments.
and, importantly the test-data can be easily interpreted as a Regarding the variability in classification performance
common benchmark to compare models. between models, we observed a difference between the two CV
For the leave-site-out CV procedure, we selected five sites procedures. When assessed by five-fold CV procedure, all tested
with the highest number of total subjects (ASD/total), namely models performed at around 60% accuracy for the classification
New York University (NYU) (90/159), Kennedy Krieger Institute task (Figure 2 and Table 2), with little variation between models'
- 1 (KKI-I) (88/114), University of Miami - 1 (UM-I) (50/96), performances. On the contrary, the leave-site-out CV procedure
University of California, Los Angeles (UCLA)-I (28/69), Oregon shows high variability in terms of model performances with
Health & Science University (OHSU)-I (42/64). In this method, relatively good performance for the NYU site and poor
each site was held out as the test set in turn. The rest of the data performance for the other sites (Figure 3, Table 3, and
were randomly split into 90% of the set used for training and 10% Supplementary Table S1).
used for validation. The validation set was used to select the best The overall best performing model was the MM-ensemble,
performing model between epochs, which was then applied to which achieved balanced accuracy of 64% and F1 score of 66%,
the test-site. In this way, for the leave-site-out CV we had as averaged over the five-folds of the CV procedure. With the leave-
many test sets as sites (i.e., five). site-out CV procedure the performance of the MM-ensemble
dropped to an average balanced accuracy of mean 56% and F1
Occlusion of Brain Regions score of 59%, suggesting that this model does not do well in
We performed an occlusion experiment to assess which regions generalizing to new sites. The overall lowest performance was
of the brain maximally determined model performance. We obtained by the entropy SM-model.
iterated over all the regions of the Harvard-Oxford atlas Among the SM-models, the ReHo SM-model is the best
(thresholded at 25 and downsampled at 4 mm to match our performing and its classification accuracy is similar to the
data resolution) systematically by occluding that part of the MM-ensemble when evaluated with five-fold CV (balanced
cortex with a mask set to be zero and monitoring the probability accuracy mean: 64%, F1-score mean: 66%). Its performance
of the classifier. We reran the five fold CV procedure on the test remained comparable when tested using the leave-site-out CV
dataset for each occluded region of interest (ROI) and calculated procedure (balanced accuracy mean: 62%, F1-score mean: 64%).
average balanced accuracy and F1 score between folds. The drop The idea of the ensemble strategy is to learn several different
in performance when the ROI is removed from the data weak learners and combine them to output predictions based on
compared to the original data is a suggestion of how much the the multiple predictions returned by these weak models.
voxels contained in the ROI contributed to the original results. Therefore the success of the MM-ensemble strategy should be
The 10 ROIs that contributed the most, hence that showed the due to the independent contribution of the information gathered
most substantial drops, are reported. from the nine summary measures. To assess that the SM-models
are processing independent information to classify the subjects,
we estimated Kendall-correlations between predictions of each
RESULTS SM-model, separately for each fold. If the correlation matrix
shows a cluster of SM-models whose predictions are highly
In our experiments we pursued the goal of classifying the rs- correlated, this would suggest that the SM-models are picking
fMRI brain images of ASD subjects from healthy controls by up on similar patterns to classify ASD from CON and therefore
means of their temporal summary measures. We assessed the they share information. To assess the contribution of each
potential of achieving this goal of each of our nine summary measurement to the MM-ensemble output, we estimated
measures in independent models, the advantage of employing all Kendall-correlations between each SM-model and the MM-
the measures together in an ensemble model approach, and the ensemble prediction, again separately for each fold. If any of
use of the measures in a single model with as many channels as the SM-models has a large influence on the MM-ensemble, this
FIGURE 2 | Balanced accuracy and F1-score for the nine single measure (SM)-models trained on the summary measures, and for the MM-ensemble and multi-
measure (MM)-model.
TABLE 2 | Mean performance evaluated as balanced accuracy (accuracy) and out for correlating with the prediction of the MM-ensemble, and
F1-score obtained for 3D convolutional neural network (CNN) and linear support
all the correlations ranged between 0.77 and 0.83. The analysis
vector machine (SVM) with five-fold cross-validation (CV).
illustrates that variation in the degree of correlation between
3D-CNN SVM predictions of SM-models is more than variation between the
correlation of each SM-model and the MM-ensemble prediction.
Accuracy F1-score Accuracy F1-score
Interestingly, while the ReHo SM-model and the MM-ensemble
ReHo 0.64 0.65 0.66 0.66 reached a similar performance accuracy, this cannot be explained
fALFF 0.62 0.63 0.57 0.58 as a more substantial contribution of ReHo to the MM-ensemble,
Degree centrality 0.61 0.62 0.63 0.64
since the correlation between the two is not stronger than
VMHC 0.61 0.62 0.62 0.62
Eigenvector centrality 0.60 0.61 0.61 0.63 between MM-ensemble and other SM-models predictions.
Autocorr lag 0.59 0.61 0.57 0.59 These observations indicate that the MM-ensemble benefits
LFCD 0.59 0.60 0.63 0.65 from independent contributions from each of the SM-
ALFF 0.59 0.60 0.57 0.58 model outputs.
Entropy 0.54 0.49 0.56 0.57
MM ensemble 0.64 0.66 0.66 0.67
MM model 0.59 0.60 0.61 0.62 Linear-Support Vector Machine Results
In order to establish a baseline for the performance of the 3D-
In bold = Highest score in that column.
CNN models, we performed the classification task following the
would appear as a stronger Kendall-correlation between their same procedures but now using a linear SVM. As for the 3D-
predictions. The correlation results, averaged between folds, are CNN experiments, we implemented the two evaluation schemes:
reported in Figure 4. There were no discernable clusters of SM- 1) five-fold cross-validation (CV) and 2) a leave-site-out CV
models whose predictions aligned, and all the correlations procedure. Linear-SVM does not require a validation procedure
between SM-models ranged between 0.77 and 0.90. In the to select the algorithm, but to be able to compare the results to
same fashion, none of the predictions of the SM-models stood the one obtained by 3D-CNN experiments, we applied exactly
FIGURE 3 | Leave-site-out balanced accuracy and F1-score for the nine single measure (SM)-models trained on the summary measures mentioned on the y axis,
and for the multi-measure (MM)-ensemble and MM-model for each test-site indicated in the legend.
TABLE 3 | Mean performance evaluated as balanced accuracy (accuracy) and reported in Tables 2 and 3 respectively (see also Supplementary
F1-score obtained for 3D convolutional neural network (CNN) and linear support
vector machine (SVM) as SM-models, MM ensemble, and MM model with leave
Table S2 for results of single site in the leave-site-out CV).
site out cross-validation (CV). The results of the linear-SVM confirmed the patterns
identified by the 3D-CNN experiments with the linear-SVM
3D-CNN SVM performing the classification task at around the 60% accuracy
Accuracy F1-score Accuracy F1-score level. The highest balanced accuracy and F1 scores were achieved
by the MM-ensemble (66 and 67% respectively, five-fold CV),
ReHo 0.62 0.64 0.63 0.62 but it suffered when tested on new sites (balanced accuracy mean:
fALFF 0.59 0.63 0.61 0.61
VMHC 0.56 0.57 0.62 0.59
53%, F1-score mean: 56%). While the accuracy of the linear-
Degree centrality 0.56 0.56 0.60 0.59 SVM MM-ensemble evaluated by five-fold CV is in line with the
fALFF 0.54 0.56 0.57 0.54 one of the 3D-CNN counterpart, its decrease in performance
LFCD 0.53 0.56 0.63 0.61 when evaluated by leave-site-out CV was steeper than what was
Eigenvector centrality 0.51 0.49 0.59 0.57 observed for the 3D-CNN MM-ensemble.
Entropy 0.51 0.49 0.53 0.52
Autocorr 0.51 0.42 0.54 0.55
Again, the ReHo SM-model was the best performing model in
MM ensemble 0.56 0.59 0.53 0.56 comparison to other SM-models and performed comparable to
MM model 0.56 0.58 0.61 0.62 the MM-ensemble when the outcome was evaluated with five-
See Figure 3 and Supplementary Tables S1 and S2 for details on how each test site
fold CV (balanced accuracy mean: 66%, F1-score mean: 66%).
performed. Interestingly, its performance remained comparable when
In bold = Highest score in that column. evaluated using the leave-site-out procedure (balanced
accuracy mean: 63%, F1-score mean: 62%). The lowest
performance was again observed for the entropy SM-model.
the same procedure described above and limited the data used to
train the linear-SVM to the 90% of the training/validation Occlusion of Brain Regions Results
dataset. The averaged balanced accuracy and F1-score for each To explore which brain regions were maximally contributing to
experiment evaluated via five-fold and leave-site-out CV are the classification results, we performed an occlusion experiment
DISCUSSION
We have proposed a machine learning solution for using rs-fMRI
that does not compromise its spatial properties. And we
presented an empirical analysis of how the choice of summary
of the temporal dimension via various statistical measures can
impact the performance of a 3D convolutional neural network in
classifying ASD subjects from the control subjects. We
FIGURE 5 | Results of the occlusion experiment for the regional considered nine different measures and used them as inputs in
homogeneity (ReHo) SM-model, balanced accuracy, and f1 are reported as a 3D-CNN model either as i) independent inputs to different 3D-
mean and standard deviation across five folds. “all brain” indicates the results CNNs (SM-models), ii) an ensemble of results from the nine
obtained by the original model, when all the ROIs are included. Region of independent 3D-CNN models to one output (MM-ensemble), or
interests (ROIs) have been identified using the Harvard-Oxford atlas (threshold
iii) a combined nine channel 3D-CNN model that used each
at 25 and downsampled at 4 mm). The ROI names indicate the results when
the named ROI is masked from the brain volume, therefore the drop from the measure as a channel.
“all brain” result is a suggestion of how much the voxels contained in the ROI Our analyses suggest that using a single summary measure is
contributed to the original results. Only the 10 ROIs that contributed the most often suboptimal for training 3D-CNNs, and more accurate
are reported. Sup., superior portion; Inf., inferior portion; Lat., lateral portion; predictions can be achieved with an ensemble approach, even
Post., posterior; Temp., temporal.
in a heterogeneous dataset such as ABIDE I+II. Each single
summary measure extracts specific information from the rs- of the brain that had the most influence on the ReHo-SM-model
fMRI data, capturing local or global aspects of the connectivity of and the ReHo-SVM outputs. The occlusion procedure used here
the voxels in the volumes. Even though different modalities may has the advantage to be easily interpretable and to open a
result in similar accuracy performance, the trained models may window to the information used by the models to perform the
contain distinct information. This is confirmed by the given task, but it is merely descriptive and no strong conclusions
correlation matrix of predictions (Figure 4). We calculated the can be drawn from its results. The drop in performance after
agreement between all the SM-models and each SM-model with occluding the precuneus is of only ~3 percentage points,
the MM-ensemble. Combining models from different modalities suggesting that the algorithms are more likely identifying
enhances the performance and creating an ensemble of these patterns spanning more than one single brain region.
measures at the last stage of outcome prediction, as done by the The studies mentioned used small groups to detect differences
MM-ensemble model, seems to take advantage of the multiple in ReHo between groups, and our findings on the large ABIDE I
representation of the data without being affected by the +II dataset suggest that ReHo is indeed a sensitive measure for
increased noise. detecting cortical abnormalities in autism.
Our MM-model's average performance was below 60%, The field of neuroimaging is benefiting from the development
which is 4% less than the MM-ensemble and less than many of of deep learning techniques and a growing number of studies
the SM models. Here, summary measures that are not conveying have applied deep learning for classification of ASD using the
information about the classification task but who were pooled ABIDE dataset. Unfortunately, comparison between results is
together with informative summary measures potentially made hard by the heterogeneity of data preprocessing, data
increased the noise of the input data and therefore complicated selection, and model selection procedures. We followed the
the classification task for the model. data selection procedure described in (16). They obtained a top
The concept of transforming weak learning algorithms into accuracy of 72.3% by summarizing the temporal dimension of
stronger learners by ensembling them has been proven successful the data in connectome matrices calculated averaging
in a number of computer vision tasks (31, 32). In the study of stochastically determined regions of interest. Their model
Khosla and colleagues, (16), the authors affirmed to have therefore outperformed our best model, the MM-ensemble,
overcome the limitation of traditional machine learning which reached an average balanced accuracy of 64 and 66%
models for connectomes that rely on region-based summary F1-score in the K-fold CV procedure (but reduced performance
statistics by ensembling the different atlases into a single model, in leave-site-out procedure, see Table 3). While they used data
with a small gain in accuracy, for a final performance of 72.3%. from ABIDE-I for training and test on ABIDE-II, we trained all
Even though the MM-ensemble approach resulted in the best our models on a mix of data from both ABIDE-I+II. Since the
performance on the ASD classification, also some of the increased heterogeneity of our training dataset could possibly
summary measures reached a good classification performance. explain the decrease in performance, we repeated the analyses of
In particular ReHo resulted in the best accuracy performance the 3D-CNN models, training the models on the data from
across the SM-models. However, this knowledge is only available ABIDE-I and testing the performance on ABIDE-II (the results
after performing the experiments and thus difficult to anticipate are reported in Supplementary Table S3). Our approach of
when choosing a particular summary measure. The MM- summarizing the rs-fMRI data on the temporal domain still
ensemble approach seems to benefit from the performance of showed lower performances compared with (16). These
ReHo without the need to select this measure a priori. differences might be because Khosla et al., used the
Differences in ReHo between ASD and CON have indeed been connectivity between ROIs and not the statistics from
reported in the literature. For example, the pericalcarine visual individual voxels to perform the classification, thus hinting at
cortex was found to be locally hyperconnected in the ASD the possibility that connectivity patterns across the brain contain
compared to CON (33), and subjects with ASD have right crucial information for the classification between ASD and CON.
dominant ReHo alterations of resting-state brain activity, i.e., The 3D CNN model we employed in our series of
areas known to exhibit abnormal stimulus or task related experiments was inspired by the one described in Khosla and
functionality (34). Decreased ReHo in the ASD group colleagues but we decrease the number of CNN layers from four
compared to the CON group was found in bilateral middle to two. The reason behind this choice lies in the fact that,
and superior frontal gyri, left superior parietal lobule, and right contrary to the original model, we do not apply regularization
precuneus. Increased ReHo in the ASD group compared to the techniques to our 3D-CNN. Our model has a total of 257,585
CON group was found in bilateral middle temporal and right trainable parameters. This number of free parameters is “small” if
parahippocampal gyri. The authors also report that the ReHo in compared to some state of art networks for computer vision. As
the precuneus correlated with the autistic trait score. Jiang and comparison, the ResNet 50 has over 23 millions trainable
colleagues found enhanced local connectivity in the middle parameters, but the number of brain volumes available for
frontal cortex, left precuneus, and right superior temporal training is also critically smaller than the number of images a
sulcus, and reduced local connectivity in the right insular ResNet50 was trained on (e.g., ImageNet has > 14 millions of
cortex using ReHo (35). images). The number of trainable parameters in our model is
These results are consistent with our occlusion experiments, suitable for the number of training examples, and a larger
which identified the precuneus and occipital cortex as the regions network will be more prone to overfitting. Indeed, when our
3D-CNN model is left training for enough time (approximately linear-SVM suggests that either there were no low-dimensional
50 epochs), it is able to reach 100% accuracy on the training set at patterns that capture the essence of the disorder in these summary
the expense of generalizability to new data: a clear indication of measures, or that the amount of data is insufficient for the 3D-
overfitting. For these reasons described, it is unlikely that the 3D- CNN to learn interesting structures. The amount of data available
CNN model was too shallow or not trained enough. might not be sufficient to leverage the ability of CNNs to detect
We built our model on the example of another model in the patterns. CNNs are highly flexible models that have been
literature, which has proven successful on the same task of developed in the context of “big data” settings. The sample size
classifying ASD from CON. The original model was trained on in our experiments is large but probably not large enough to take
different features then ours. The relatively low classification full advantage of CNN models. This could explain accuracies
accuracy that even our best model obtained might be a similar to those of much less flexible linear-SVM models.
consequence of this choice: the parameters that made the We have shown that simple temporal transformation can lead
original model successful did not generalize to our input data. to accuracies comparable to state-of-the-art for a complex task
Indeed we performed a non-systematic parameter and hyper- like classifying ASD from control subjects. But we also found that
parameter search and carried out the full experiments on the best there is not much advantage in using a 3D-CNN architecture for
configuration. Unfortunately, the numbers of adjustable this task. We have, including in our previous studies and this,
parameters in CNN models are extremely high, and it would be shown various ways of reducing the dimension of the rs-fMRI
computationally prohibitive to carry out a full systematic search. signal before feeding it into a machine learning algorithm. In the
The ABIDE dataset is composed of distinct datasets collected future, we plan to utilize the full 4D structure of the rs-fMRI
by different institutions. The data collected vary in terms of without compromising the resolution in either time or space.
demographics of the participants, scanning hardware, and This can be achieved for example by exploiting larger datasets
sequences for data collection, therefore the images vary in like the UK Biobank6 to learn representations for rs-fMRI signals
image quality and resolution. Differences in resolution do not which can then be used in small-sample psychiatric datasets
present a concern, because all the images were resampled to the like ABIDE.
same 4x4x4 mm3 voxel size in MNI space. Differences in sample
composition and data acquisition contribute considerably to the
heterogeneity of the data, which has been identified as one of the DATA AVAILABILITY STATEMENT
most important limitations of the ABIDE dataset, and therefore
might explain the low accuracy achieved (36). The datasets generated and the 3D convolutional models used for
In general, heterogeneity of the dataset has been pointed out this study are available on request to the corresponding author.
by many as a limitation in performing ML on neuroimaging data
(37). Classification accuracy drops significantly in larger
population samples and especially when the data are AUTHOR CONTRIBUTIONS
aggregated from different sites (36).
SG and RT contributed equally to the paper and are co-first
Another possibility is that the time domain of rs-fMRI data
authors. SG performed the experiments and drafted the first
contains properties that get lost when the summarizing
version of the manuscript. RT designed the architecture of the
procedures are applied. Correlation and its derivative (like our
models and contributed to the manuscript. RT and GW
summary measures) are first-order transformation, which does
conceived the problem. LC, AA-G, and PZ contributed to
not account for higher-order interactions between time courses.
designing the experiments and preprocessing of neuroimaging
In previous work of our group, we maintained the time
data. All authors contributed to the final manuscript.
dimension while summarizing the spatial dimension using
ROIs (Harvard Oxford atlas). This approach obtained an
accuracy of 68% using a simple 1D-CNN model on the ABIDE
FUNDING
I+II dataset (3).
Another interesting finding is that the linear-SVM performed This work was supported by the Netherlands Organization for
as well, and in certain instances better, than the 3D-CNN Scientific research (NWO/ZonMw Vidi 016.156.318).
models. He et al., (8), have found that SVM do as well as 3D-
CNNs in other tasks as well. We hypothesize that in our task,
there was no apparent underlying structure in these 3D summary SUPPLEMENTARY MATERIAL
measures that could be exploited. A linear-SVM can be thought
of as a fully connected neural network with non-linear activation The Supplementary Material for this article can be found online
function (sign function). Our 3D-CNN included also a fully at: https://www.frontiersin.org/articles/10.3389/fpsyt.2020.
connected last layer that can again be thought of as an SVM on 00440/full#supplementary-material
the representations learnt by preceding convolutional layers. The
6
fact that the 3D-CNN architecture could not outperform the https://www.ukbiobank.ac.uk/
REFERENCES 20. Zuo X-N, Di Martino A, Kelly C, Shehzad ZE, Gee DG, Klein DF, et al. The
oscillating brain: complex and reliable. Neuroimage (2010) 49:1432–45. doi:
1. Woodward ND, Cascio CJ. Resting-State Functional Connectivity in 10.1016/j.neuroimage.2009.09.037
Psychiatric Disorders. JAMA Psychiatry (2015) 72:743. doi: 10.1001/ 21. Zuo X-N, Ehmke R, Mennes M, Imperati D, Xavier Castellanos F, Sporns O,
jamapsychiatry.2015.0484 et al. Network Centrality in the Human Functional Connectome. Cereb Cortex
2. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature (2015) 521:436–44. (2012) 22:1862–75. doi: 10.1093/cercor/bhr269
doi: 10.1038/nature14539 22. Wang Z, Li Y, Childress AR, Detre JA. Brain entropy mapping using fMRI.
3. Gazzar AE, El Gazzar A, Cerliani L, van Wingen G, Thomas RM. Simple 1-D PloS One (2014) 9:e89948. doi: 10.1371/journal.pone.0089948
Convolutional Networks for Resting-State fMRI Based Classification in 23. Tomasi D, Volkow ND. Association between functional connectivity hubs and
Autism. 2019 Int Joint Conf Neural Networks (IJCNN) (2019) 1–6. brain networks. Cereb Cortex (2011) 21:2003–13. doi: 10.1093/cercor/bhq268
doi: 10.1109/ijcnn.2019.8852002 24. Kaneoke Y, Donishi T, Iwatani J, Ukai S, Shinosaki K, Terada M. Variance and
4. Sherkatghanad Z, Akhondzadeh M, Salari S, Zomorodi-Moghadam M, Abdar autocorrelation of the spontaneous slow brain activity. PloS One (2012) 7:
M, Acharya UR, et al. Automated Detection of Autism Spectrum Disorder e38131. doi: 10.1371/journal.pone.0038131
Using a Convolutional Neural Network. Front Neurosci (2020) 13:736. doi: 25. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In:
10.3389/fnins.2019.01325 Proceedings of the IEEE conference on computer vision and pattern recognition.
5. Heinsfeld AS, Franco AR, Craddock RC, Buchweitz A, Meneguzzi F. Identification (2016). p. 770–8.
of autism spectrum disorder using deep learning and the ABIDE dataset. 26. Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-
NeuroImage Clin (2018) 17:16–23. doi: 10.1016/j.nicl.2017.08.017 Scale Image Recognition. arXiv [csCV] (2014). http://arxiv.org/abs/1409.1556
6. Guo X, Chen H, Long Z, Duan X, Zhang Y, Chen H. Atypical developmental 27. Avila J, Hauck T. scikit-learn Cookbook: Over 80 recipes for machine learning
trajectory of local spontaneous brain activity in autism spectrum disorder. Sci in Python with scikit-learn. Packt Publishing (2017).
Rep (2017) 7:39822. doi: 10.1038/srep39822 28. Garreta R, Moncecchi G. Learning scikit-learn: Machine Learning in Python.
7. Li G, Rossbach K, Jiang W, Du Y. Resting-state brain activity in Chinese boys Packt Publishing (2013).
with low functioning autism spectrum disorder. Ann Gen Psychiatry (2018) 29. James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical
17:47. doi: 10.1186/s12991-018-0217-z Learning: with Applications in R. Springer (2013).
8. He T, Kong R, Holmes AJ, Nguyen M, Sabuncu MR, Eickhoff SB, et al. Deep 30. Peng H, Gong W, Beckmann CF, Vedaldi A, Smith SM. Accurate brain age
Neural Networks and Kernel Regression Achieve Comparable Accuracies for prediction with lightweight deep neural networks. BioRxiv doi: 10.1101/
Functional Connectivity Prediction of Behavior and Demographics. 2019.12.17.879346
Neuroimage. (2020) 206:116276. doi: 10.1101/473603 31. Opelt A, Fussenegger M, Pinz A, Auer P. Weak Hypotheses and Boosting for
9. Schulz M-A, Thomas Yeo BT, Vogelstein JT, Mourao-Miranada J, Kather JN, Generic Object Detection and Recognition. Lecture Notes Comput Sci (2004)
Kording K, et al. Deep learning for brains?: Different linear and nonlinear 3022:71–84. doi: 10.1007/978-3-540-24671-8_6
scaling in UK Biobank brain images vs. machine-learning datasets. BioRxiv 32. Opelt A, Pinz A, Fussenegger M, Auer P. Generic object recognition with
(2020). doi: 10.1101/757054 boosting. IEEE Trans Pattern Anal Mach Intell (2006) 28:416–31.
10. Di Martino A, Yan C-G, Li Q, Denio E, Castellanos FX, Alaerts K, et al. The doi: 10.1109/tpami.2006.54
autism brain imaging data exchange: towards a large-scale evaluation of the 33. Jao Keehn RJ, Nair S, Pueschel EB, Linke AC, Fishman I, Müller R-A. Atypical
intrinsic brain architecture in autism. Mol Psychiatry (2014) 19:659–67. doi: Local and Distal Patterns of Occipito-frontal Functional Connectivity are
10.1038/mp.2013.78 Related to Symptom Severity in Autism. Cereb Cortex (2018). doi: 10.1093/
11. Van Dijk KRA, Hedden T, Venkataraman A, Evans KC, Lazar SW, Buckner cercor/bhy201
RL. Intrinsic functional connectivity as a tool for human connectomics: 34. Paakki J-J, Rahko J, Long X, Moilanen I, Tervonen O, Nikkinen J, et al. Alterations
theory, properties, and optimization. J Neurophysiol (2010) 103:297–321. in regional homogeneity of resting-state brain activity in autism spectrum
doi: 10.1152/jn.00783.2009 disorders. Brain Res (2010) 1321:169–79. doi: 10.1016/j.brainres.2009.12.081
12. Cameron C, Sharad S, Brian C, Ranjeet K, Satrajit G, Chaogan Y, et al. 35. Jiang L, Hou X-H, Yang N, Yang Z, Zuo X-N. Examination of Local
Towards Automated Analysis of Connectomes: The Configurable Pipeline for Functional Homogeneity in Autism. BioMed Res Int (2015) 2015:174371.
the Analysis of Connectomes (C-PAC). Front Neuroinform (2013) 7. doi: 10.1155/2015/174371
doi: 10.3389/conf.fninf.2013.09.00042 36. Nielsen JA, Zielinski BA, Fletcher PT, Alexander AL, Lange N, Bigler ED, et al.
13. Friston KJ, Holmes AP, Worsley KJ, Poline J -P., Frith CD, Frackowiak RSJ. Multisite functional connectivity MRI classification of autism: ABIDE results.
Statistical parametric maps in functional imaging: A general linear approach. Front Hum Neurosci (2013) 7:599. doi: 10.3389/fnhum.2013.00599
Hum Brain Mapp (1994) 2:189–210. doi: 10.1002/hbm.460020402 37. Thomas RM, Bruin W, Zhutovsky P, van Wingen G. Dealing with missing
14. Behzadi Y, Restom K, Liau J, Liu TT. A component based noise correction data, small sample sizes, and heterogeneity in machine learning studies of
method (CompCor) for BOLD and perfusion based fMRI. Neuroimage (2007) brain disorders. Mach Learn (2020), 249–66. doi: 10.1016/b978-0-12-815739-
37:90–101. doi: 10.1016/j.neuroimage.2007.04.042 8.00014-6
15. Avants BB, Tustison N, Song G. Advanced normalization tools (ANTS).
Insight J (2009) 2:1–35. Conflict of Interest: GW received funding from Philips Research for another
16. Khosla M, Jamison K, Kuceyeski A, Sabuncu MR. Ensemble learning with 3D research project.
convolutional neural networks for functional connectome-based prediction. The remaining authors declare that the research was conducted in the absence of
NeuroImage (2019) 199:651–62. doi: 10.1016/j.neuroimage.2019.06.012 any commercial or financial relationships that could be construed as a potential
17. Zang Y, Jiang T, Lu Y, He Y, Tian L. Regional homogeneity approach to fMRI conflict of interest.
data analysis. N eur oIma ge (2 00 4) 22 :394 –4 00 . doi: 10 .1 01 6/
j.neuroimage.2003.12.030 Copyright © 2020 Thomas, Gallo, Cerliani, Zhutovsky, El-Gazzar and van Wingen.
18. Zang Y-F, He Y, Zhu C-Z, Cao Q-J, Sui M-Q, Liang M, et al. Altered baseline This is an open-access article distributed under the terms of the Creative Commons
brain activity in children with ADHD revealed by resting-state functional Attribution License (CC BY). The use, distribution or reproduction in other forums is
MRI. Brain Dev (2007) 29:83–91. doi: 10.1016/j.braindev.2006.07.002 permitted, provided the original author(s) and the copyright owner(s) are credited and
19. Zou Q-H, Zhu C-Z, Yang Y, Zuo X-N, Long X-Y, Cao Q-J, et al. An improved that the original publication in this journal is cited, in accordance with accepted
approach to detection of amplitude of low-frequency fluctuation (ALFF) for academic practice. No use, distribution or reproduction is permitted which does not
resting-state fMRI: Fractional ALFF. J Neurosci Methods (2008) 172:137–41. comply with these terms.
doi: 10.1016/j.jneumeth.2008.04.012