

This section provides a brief overview on the application of GCNs for the classification of subjects for autism disorders. Figure \ref{fig_models} illustrates the 3 types of GCN based models: $p$-GCN and our proposed models $s$-GCN and $ss$-GCN. As can be seen, the pipeline involves an initial population graph which comprises two parts: 1) a feature vector that characterizes each node (subject) of the graph, and 2) a similarity measure to define edges (relations) between the nodes.

In $p$-GCN, the feature vector for each node is obtained by building a correlation matrix between the time series values from all possible pairs of regions in the respective brain atlas. The information contained in the upper triangle of the correlation matrix is  extracted, flattened and then passed to a ridge regressor to perform Recursive Feature Elimination (RFE). The reduced set of features obtained from RFE are used to characterize the respective node in the graph. Unlike the feature vectors, the edges are defined using non-imaging phenotypic measures such as age, sex or acquisition site of the fMRI scans (denoted by $M_h$). The function $\gamma$ determines the existence of an edge based on equal phenotypic data. It is defined differently depending on the type of phenotypic measure integrated in the graph. For categorical information such as subject’s sex, $\gamma$ is defined as the Kronecker delta function $\delta$, meaning that the edge weight between subjects is increased if e.g. they have the same sex. Constructing edge weights from quantitative measures (e.g. subject’s age) is slightly less straightforward. In such cases, $\gamma$ is defined as a unit-step function with respect to a threshold. Further details can be found in Section 2.2.2 of \citet{parisot2018disease}. Using this information about each subject, \citet{parisot2018disease} created an adjacency matrix $\mathbf{W}$ of the graph  as 
\begin{equation}\label{eq:1}
    W_{ij} = Sim(S_i, S_j)\sum_{h=1}^{H} \gamma (M_h(i), M_h(j)).
\end{equation}
Here, $Sim(S_i, S_j)$ denotes a measure of similarity between the $i^{\text{th}}$ and $j^{\text{th}}$ subject's feature vectors, thus strengthening the links between similar nodes of the graph and weakening the less similar ones.  This results in a sparse adjacency matrix.

The primary disadvantage of using non-imaging data such as site information in creating adjacency matrix is the lack of flexibility when scaling to larger datasets, especially if a site has very few subjects or if a new site is added to the database. Moreover, the adjacency matrix defined by \mbox{Eq. \ref{eq:1}} compromises the effectiveness of the GCN architecture. The major advantage of using a GCN is its capability to combine information from two different channels defined on its nodes and edges, respectively. However, due to use of fMRI based similarity measure $Sim(S_i, S_j)$ for defining the edges, there is a significant overlap of resting state fMRI information in $p$-GCN, which eventually limits its discriminative power. To circumvent these issues, the yet unused structural information of the brain can be used to determine the similarities of the brains' structure between subjects. The motivation and advantages of using sMRI information to define connections between subjects is discussed in the subsequent section.



% Thus, we alleviate the use of metadata such as age, gender, \emph{etc.}, and derive the corresponding similarity measure from actual brain data of the patients. In $p$-GCN and its variants \cite{parisot2018disease, Kazi2018arxiv}, adjacency matrices are derived from phenotypic data so that the GCN model can combine features of similar subjects and be more expressive. Clearly, the choice of similarity measure being used to define the adjacency matrices seems to be promising, and it is of interest to see whether sMRI data can provide more reliable information than the conventionally used phenotypic data.



% The application of GCN for brain analysis in populations was first introduced by \cite{parisot2018disease}, and we introduce here the original implementation of GCN for autism classification as described in  their work.  

% For a better understanding of GCN, first the concept of population graph construction is explained. To obtain the feature vector for every node, a correlation matrix is built. This matrix characterizes the relation between the time series values corresponding to all possible pairs of regions in the respective atlas of study. The information contained in the upper triangle of the correlation matrix is  extracted, flattened and then passed to a ridge regressor to perform recursive feature elimination (RFE). The reduced set of features obtained from RFE are used to characterize the respective node in the graph.

% The edges of the graph provide association between subjects' feature vectors. To define these edges, \citet{parisot2018disease} use non-imaging phenotypic measures $\mathbf{M} = \{M_h\}$(\emph{e.g.} age or sex of the subject), and the corresponding adjacency matrix $\mathbf{W}$ is constructed as follows.
% \begin{equation}
%     W_{ij} = Sim(S_i, S_j)\sum_{h=1}^{H} \gamma (M_h(v), M_h(w)).
% \end{equation}
% Here, $Sim(S_i, S_j)$ denotes the measure of similarity between the $i^{\text{th}}$ and $j^{\text{th}}$ subjects, thus strengthening the links between similar nodes of the graph and weakening the less similar ones. The term $\gamma$ defines the measure of distance between the phenotypic measures and can vary based on the choice of phenotypic data used in the graph. For further details on building the edges, see \citet{parisot2018disease}.

% {\color{dpk} THIS PART NEEDS TECHNICAL POLISHING: After the feature vectors and graph edges have been defined, spectral graph convolutions are computed through multiplications in the Fourier domain \cite{Shuman2013}. The normalized Laplacian of the graph $\mathcal{G} = \{\mathcal{V}, \mathcal{E}, \mathbf{W} \}$ is defined as $\mathcal{L} = \mathbf{I}_N - \mathbf{D}^{-\frac{1}{2}}\mathbf{W}\mathbf{D}^{-\frac{1}{2}}$, where $\mathbf{I}_N$ and $\mathbf{D}$ denote an identity matrix of size $N$ and the diagonal degree matrix, respectively. An eigen decomposition of the Laplacian matrix $\mathcal{L}$ gives a set of eigen vectors, and among these, the vector with low frequencies/eigen values vary slowly across the graph, which means that the vertices connected by an edge of larger weight have similar values in the corresponding location of these eigen vectors.}  

% For any spatial signal $\mathbf{x}$ defined on graph $\mathcal{G}$, the respective spectral deconvolution with the filter $g_{theta} = diag(\theta)$, expressed as multiplication in the Fourier domain, can be stated as,
% \begin{equation}
% g_{\theta} \circledast \mathbf{x} = g_{\theta}(\mathbf{U}\wedge \mathbf{U}^{\intercal})\mathbf{x} = \mathbf{U}g_{\theta}(\wedge)\mathbf{U}^{\intercal}\mathbf{x},
% \end{equation}
% where $\theta \in \mathbb{R}^N$ are the parameters of filter $g_{\theta}$. 
% The choice of filters is restricted to polynomials as $g_{\theta}(\wedge) = \sum_{k=0}^K \theta_k \wedge^k$.

