Neuroimaging holds the promise of objective diagnosis and prognosis in psychiatry. However, unlike neurological disorders, psychiatric disorders do not show obvious alterations in physical appearance of the brain. Thus, structural Magnetic Resonance Imaging (sMRI) scans of the brain do not reveal differences between a healthy and a pathological brain. Researchers have long posited that patterns which distinguish between the two brains are not in sMRI, but in resting-state MRI (rs-fMRI) scans instead \cite{zhan2015window}. These scans involve mapping the blood oxygenation level (a proxy for brain activity) throughout the brain at an interval of 1-2 seconds, resulting in a 4D spatio-temporal image. Typically, at a scanning resolution of 4 mm and 300 temporal sampling points, this results in a 20 million dimensional feature vector. Finding patterns in a high-dimensional space to distinguish between healthy and psychiatric subjects is a challenge that still needs to be resolved. 

One of the major challenges in developing an objective schema for the diagnosis of autism is the scarcity of reliable, consistent and sufficiently large datasets. Some recent initiatives such as the Autism Brain Imaging Data Exchange (ABIDE) \mbox{\cite{di2014autism}} have tried to aggregate brain imaging dataset of Autistic (ASD) and typically developing or control (TD or CON) participants from various sites around the world. The complete dataset including ABIDE-I and ABIDE-II comprises over 2100 subjects including ASD and CON. ABIDE has thus become a benchmark dataset for autism classification. 

Several machine learning approaches have been used for autism classification, such as support vector machine \citep{jiao2010predictive, bi2018classification}, decision tree \citep{jiao2010predictive}, random forest \citep{maenner2016development}, deep neural networks \citep{Khosla2018ni, gazzar2019simple, kong2019classification}, among others. All these methods rely solely on subject-specific imaging features that fail to encode the similarities or dissimilarities between subjects. However, relational information is highly desirable in autism classification because (a) datasets are relatively small for a deep learning model, and (b) the dataset is obtained from multiple sites leading to inconsistent data-points. 

Recent approaches using Graph Convolutional networks (GCNs) \citep{parisot2017miccai, anirudh2019bootstrapping} have been shown to utilize the relations between subjects along with their brain activity patterns. GCN uses a population graph where subjects (defined as nodes) are connected to similar ones through edges. The prediction for any new subject can be made based on both the subject-specific data, as well as the relational information from other similar subjects. However, almost all recent studies limit their study to a subset of the subjects, which primarily involves rejecting subjects with data of too short duration as well as those containing significant noise in them \cite{moradi2017predicting, abraham2017deriving, parisot2017miccai, Khosla2018ni}. While this helps in better training of the models, the adverse affect includes reduced generality to noisy and complex test subjects. Another approach \cite{ktena2017distance}
uses metric learning method to evaluate distance
between graphs, where each graph represents a brain network of each subject and the dataset used is a curated list from ABIDE-I. In this paper, unlike \citep{ktena2017distance} we cast the problem as node classification on a population graph and develop Deep Learning (DL) models which can deliver comparable performances even when using the entire ABIDE dataset spanning across all sites containing heterogeneous samples. 
 

% In the recent years, Machine learning (ML) approaches have demonstrated enormous potential in processing images and videos. Deep learning (DL) in particular has been extremely successful in myriad different fields. Apart from the algorithmic improvements, deep learning has been successful largely due to the availability of massive datasets in the image processing fields. Unfortunately, large datasets are scarce or almost non-existent in neuroimaging. 


% But some initiatives like the Autism Brain Imaging Data Exchange (ABIDE) dataset \cite{di2014autism} have tried to aggregate brain imaging dataset of Autistic (ASD) and healthy controls (CON) from various sites around the world to build up a reasonably sized dataset. The complete dataset including ABIDE-I and ABIDE-II comprises around 2100 subjects including ASD and CON. 
% Almost all recent studies limit their study to a subset of the subjects, which primarily involves rejecting subjects with data of too short durations as well as those containing significant noise in them \cite{parisot2017miccai, parisot2018disease, Khosla2018ni}. While this helps in better training of the models, the adverse affect includes reduced generality to noisy and complex test subjects. In this paper, we develop DL models which can deliver comparable performances, even when trained on the entire ABIDE dataset spanning across all sites. 


\begin{figure}
\centering
\begin{tikzpicture}
\centering
    \draw (0, 0) node[inner sep=0] {\includegraphics[scale=0.5]{figures/models/midl_figure.pdf}};
    \draw (6.1, 2.1) node {\scriptsize Healthy (CON)};
    \draw (6.1, 1.73) node {\scriptsize Autistic (ASD)};
    \draw (6.1, 1.4) node {\scriptsize To be classified};
    \draw (-2.2, -2.8) node {\footnotesize 3D CNN};
    \draw (-2.2, -0.28) node {\footnotesize VAE};
    \draw (-2.2, 2.23) node {\scriptsize \scalebox{.85}[1.0]{Correlation and}}; 
    \draw (-2.15, 2.0) node{\scriptsize \scalebox{.77}[1.0]{Dimension Reduction}};
    \draw (-4.7, 2.9) node {\scriptsize Brain Atlases};
    \draw (-4.7, 1.1) node {\scriptsize Phenotypic Data};
    \draw (-4.7, -0.28) node {\scriptsize Structural MRI};
    \draw (-4.67, -1.98) node {\scriptsize Brain Summaries};
    \draw (-6.27, 1.63) node {\footnotesize \textbf{\emph{p}-GCN}};
    \draw (-6.9, 0.2) node {\footnotesize \textbf{\emph{s}-GCN}};
    \draw (-6.40, -1.55) node {\footnotesize \textbf{$ss$-GCN}};
    \draw (1.2, -1.3) node {\scriptsize Incomplete};
    \draw (1.2, -1.6) node {\scriptsize Graph};
    \draw (3.65, -1.3) node {\scriptsize GCN};
    \draw (6.15, -1.3) node {\scriptsize Completed};
    \draw (6.15, -1.6) node {\scriptsize Graph};
\end{tikzpicture}
\vspace{-1em}
\caption{Schematic representation showing different models based on Graph Convolutional Networks (GCN) for the classification of subjects for autism disorders.}
\label{fig_models}
\end{figure}


Albeit the availability of the ABIDE dataset, the dimensionality of the input data is too large to use it without any preprocessing or feature engineering. Different approaches have been used in the past to reduce the dimensionality of the data. Since rs-fMRI data comprises spatio-temporal signal, dimensionality reduction can be performed in space, time or even both. An approach for spatial downscaling is to use brain atlases, where the about one million voxels in space are locally averaged to obtain around 100 to 400 non-overlapping regions. The reduced set of time courses thereafter can directly be treated using a 1D convolutional neural network (CNN) \cite{gazzar2019simple}, or used to build a correlation matrix that with further processing provides an even reduced set of features \cite{parisot2018disease}. Features obtained from the correlation matrix are for example reduced using recursive feature elimination (RFE) approach, where a subset of features are iteratively removed until a desired dimension is reached. 

An alternate approach to treating the 4D brain volumes would be to preserve the full resolution, and only perform reductions in the temporal dimension. For example, the temporal signals could be summarized at voxel level using summary measures such as Amplitude Low Frequency Fluctuation (ALFF). ALFF is a measure that is posited to reveal differences in the underlying processing of the brain and is calculated based on the ratio of spectral power in two distinct frequency ranges. To the best of our knowledge, such summaries have not been incorporated in DL models for neuroimaging, and in this paper, we explore the applicability of such summaries. Moreover, we eliminate the use of reduction techniques such as RFE to avoid undesired excessive loss of information. Rather, we propose to use a Variational Autoencoder (VAE) to project the information on to a lower dimensional representation, and use it as a feature vector for our model. 


In GCN based methods, while the features of the subjects are used to characterize the nodes, the definition of edges relies mostly on their phenotypic data (\emph{e.g.} sex, age and acquistion sites). However, phenonotypic information are merely proxies and instead of using them to define connections among the subjects, we propose to use the `actual similarities' between the brains' structures. In the past, sMRI data has been used to understand the variability of brain structure based on age ~\cite{brickman2007structural, su2012predicting}, gender \cite{tyan2017gender} and acquisition sites  \cite{littmann2006acquisition}. This implies that these phenotypic parameters correlate with the structural imaging data with an association between sMRI and age/sex/site. These studies also indicate, for example, that “brain-age” need not always coincide with the age reported. In order to avoid such uncertainties in establishing the edges on a graph, we resort to the use of sMRI images directly giving us one variable to establish the relationship. 
Hence, as opposed to defining relations based on arbitrary metadata to infer structural brain similarities, comparing actual structural data from subjects will yield a better approximation of similar brain structures. Therefore it can be assumed to have lower variance in the functional features, since the brain is expected to behave in a more similar way. Based on this motivation, we hypothesize that the structural images have higher expressibility of subject relations, and propose to use them to build the edges of the population graph. For better clarity, here and henceforth, we will refer the approaches of \citet{parisot2018disease} and that based on structural MRI data as \emph{p}-GCN and \emph{s}-GCN, respectively. Furthermore, the approach involving structural MRI data as well as the brain summaries will be referred as $ss$-GCN.




% As autism is a functional disorder, structural information cannot be directly used to classify ASD subjects. However, instead of using different phenotypic information such as age, sex and acquisition sites to define connections among the subjects, the `actual similarities' of the brains' hardwares are used in this work. In the past, sMRI data has been used to understand the variability of brain hardware based on age ~\cite{brickman2007structural, su2012predicting}, gender \cite{tyan2017gender} and acquisition sites  \cite{littmann2006acquisition}. This implies that these phenotypic parameters correlate with the structural imaging data. Hence, as opposed to gathering functional data from subjects with similar arbitrary metadata, data from subjects with similar structural representations can be expected to have lower variance. Based on this motivation, we hypothesize that the structural data of the brain (T1 weighted) can provide a better measure of similarity between subjects than the conventionally used phenotypic data. 


% Among the recently proposed DL models, GCNs have been shown to perform relatively well for autism classification \cite{parisot2017miccai, parisot2018disease, anirudh2019bootstrapping}. GCN creates a graph model where subjects (defined as nodes) are connected to similar ones through edges, and prediction for any new subject can be made based on the weights of the edges connected to the respective node. While the features of the subjects are used to characterize the nodes, the definition of edges relies mostly on their phenotypic data (\emph{e.g.} sex and age). However, phenonotypic data are merely proxies, and previous research works have shown that the employed phenotypic parameters correlate well with structural MRI data \cite{brickman2007structural, tyan2017gender, littmann2006acquisition}. We hypothesize that the structural images have higher expressibility of subject relations, and propose to use them instead of the phenotypic data. This information is used to build the edges of the graph, thereby connecting the nodes based on the proximity of their structural image representations. For better clarity, here and henceforth, we will refer the approaches of \citet{parisot2018disease} and that based on structural MRI data as \emph{p}-GCN and \emph{s}-GCN, respectively. Further, the approach involving structural MRI data as well as the brain summaries will be referred as fused-GCN. 


%In this work, we want to primarily explore the effect of using a full spatial resolution input which is then achieved by summarizing the temporal dimension. This novel approach involves the calculation of various temporal summary measures for each voxel. Most of the summary measures are informed by neuroimaging literature. For example, the Amplitude Low Frequency Fluctuation (ALFF) is a measure that is posited to reveal differences in the underlying processing of the brain and is calculated based on the ratio of spectral power in two distinct frequency ranges. Apart from these summary measures, no previous study in autism classification, to the knowledge of the authors, have incorporated a combination of structural T1 scans (sMRI) and rs-fMRI scans in a consistent manner. 

%We propose the use of graph convolutional neural networks (GCN) for autism classification. Each node of the graph represents a subject and the signal on the graph is the summary measure corresponding to the subject. Thus, the problem is cast as a graph node classification problem. The node (subjects) are connected to each other based on the proximity of their structural image representations, thus providing a natural way to incorporate sMRI for classification.

In this paper, we address the various limitations of the existing methods as outlined above. To summarize, the main contributions\footnote{Code available at https://github.com/RichardOlij/Fusing-ss-GCN-for-Autism-Classification} of our paper are:

\begin{itemize}

\item the fusion of structural and functional resting-state images for autism classification, thus alleviating the need to use non-imaging metadata of patients,

\item the use of various temporal summary measures to reduce the 4D input volume to a 3D volume at the original spatial resolution for classification.

\item Finally, we present a novel 3D CNN-GCN model for improved classification of subjects for autism disorders. The CNN module is used to encode the summarized 3D volumes into lower dimensional feature vectors for the nodes of the graph model.

\end{itemize}