The ABIDE dataset is aggregated from multiple sites using different scanner types and acquisition protocols such as scanning time and repetition time (TR). Hence, the dataset contains sensitive variations that compromise the consistency between sites. To reduce the effect of site-specific sources of variability and assess the robustness of the classification model, leave-one-site-out cross-validation experiments are performed. The left out site for every training process is used as the test set to evaluate the model. The motivation for designing such an experimental setup is to test adaptability of the model to previously unseen sites. Therefore, we perform leave-one-site-out experiment to compare \emph{p}-GCN with the proposed \emph{s}-GCN and $ss$-GCN approaches. For an equitable comparison between the models, we choose the best performing atlases for $p$-GCN and $s$-GCN as shown in Figure \ref{fig:exp_1} (\texttt{cc\_200} and \texttt{H0} respectively), and the best summary for $ss$-GCN as in \mbox{Table \ref{table_summary}} (\texttt{ReHo}). 

We report the accuracy scores on 5 sites that contribute most to the number of subjects in ABIDE. Details related to the subject composition from various sites can be found in Appendix \ref{site_distributions}. It should be noted that data from the same acquisition center but different ABIDE collection (I or II) are treated as being from different sites. This is because data in ABIDE-II from the same center can have different scanning protocols, repetition time (TR) and so on.  Table \ref{loso} gives the accuracy scores (with the standard deviation) of the leave-one-site-out experiment on these 5 sites. For 4 out of 5 sites, our proposed approaches outperform the baseline $p$-GCN method. In particular, we observe that for the sites \texttt{ABIDEII-KKI\_1} and \texttt{ABIDEII-GU\_1}, our $ss$-GCN approach provides remarkable improvements over the $p$-GCN method of 18.8\% and 8.9\%, respectively. The large variations in the performances across sites is a result of vast heterogeneity in datasets between sites. These occur either due to many reasons such as different SNR per site (as can be referred from Appendix \ref{snr_distribution}) and different MRI image acquisition parameters at every site. Now, $s$-GCN has elements of the cross-correlation matrix as input compared to $ss$-GCN that have the brain summaries, thus the effect of these heterogeneities could be dramatically different. Moreover, it can be seen that the variance in results of $ss$-GCN and $s$-GCN is much lower than $p$-GCN. This shows the
robustness and generalizability of our proposed model in classifying Autistic (ASD) and
healthy controls (CON) across multiple sites.

