\section{Experiments}\label{sec:experiments}
In this section, we empirically evaluate our proposed GCAN interpolation methods on nine standard benchmark datasets. Our goal is to see whether tuning $\eta$ gives better results than both GCN and GAT. 
The setup details of our experiment are described in \Cref{appendix:datasets}.
% In \Cref{appendix:datasets}, we also empirically study the number of training problem instances needed to learn a good parameter value over unseen test instances and the dependence of algorithmic performance on the hyperparameters in the algorithms introduced in \Cref{sec:label_prop}.
%
% Table: gcan with different eta
\begin{table*}[ht]
\small
\centering
\resizebox{\textwidth}{!}{%
\begin{tabular}{|c|*{11}{>{\centering\arraybackslash}p{1.3cm}|}} 
\hline
\textbf{Dataset} & $0.0$ & $0.1$ & $0.2$ & $0.3$ & $0.4$ & $0.5$ & $0.6$ & $0.7$ & $0.8$ & $0.9$ & $1.0$ \\
\hline
CIFAR10 & $0.7888 \pm 0.0010$ 
&$0.7908 \pm 0.0008$ 
&$0.7908 \pm 0.0015$ 
&$0.7907 \pm 0.0012$ 
&$0.7943 \pm 0.0022$ 
&$0.7918 \pm 0.0018$ 
&$0.7975 \pm 0.0017$ 
&$0.7971 \pm 0.0023$ 
&$0.7921 \pm 0.0023$ 
&$0.7986 \pm 0.0028$ 
&$\textbf{0.7984}  \boldsymbol{\pm} \textbf{0.0023}$ \\
\hline
WikiCS &$0.9525 \pm 0.0007$ 
&$0.9516 \pm 0.0006$ 
&$0.9532 \pm 0.0011$ 
&$0.9545 \pm 0.0008$ 
&$0.9551 \pm 0.0015$ 
&$0.9545 \pm 0.0012$ 
&$0.9539 \pm 0.0012$ 
&$\textbf{0.9553} \boldsymbol{\pm} \textbf{0.0012}$ 
&$0.9530 \pm 0.0007$ 
&$0.9536 \pm 0.0009$ 
&$0.9539 \pm 0.0009$ 
\\
\hline
Cora & $0.6132 \pm 0.0218$ 
&$0.8703 \pm 0.0251$ 
&$0.8879 \pm 0.0206$ 
&$0.8396 \pm 0.0307$ 
&$0.8022 \pm 0.0385$ 
&$0.8615 \pm 0.0402$ 
&$ \textbf{0.9011} \boldsymbol{\pm} \textbf{0.0421}$ 
&$0.8088 \pm 0.0362$ 
&$0.8505 \pm 0.0240$ 
&$0.8549 \pm 0.0389$ 
&$0.8725 \pm 0.0334$ \\
\hline
Citeseer & $\textbf{0.7632} \boldsymbol{\pm} \textbf{0.0052}$ 
&$0.6944 \pm 0.0454$ 
&$0.7602 \pm 0.0566$ 
&$0.7500 \pm 0.0461$ 
&$0.7339 \pm 0.0520$ 
&$0.7427 \pm 0.0462$ 
&$0.7588 \pm 0.0504$ 
&$0.7193 \pm 0.0567$ 
&$0.7661 \pm 0.0482$ 
&$0.7266 \pm 0.0412$ 
&$0.7471 \pm 0.0444$ 
\\
\hline
PubMed & $0.9350 \pm 0.0009$ 
&$0.9306 \pm 0.0006$ 
&$0.9356 \pm 0.0009$ 
&$0.9281 \pm 0.0007$ 
&$\textbf{0.9356} \boldsymbol{\pm} \textbf{0.0007}$ 
&$0.9319 \pm 0.0009$ 
&$0.9313 \pm 0.0007$ 
&$0.9288 \pm 0.0009$ 
&$0.9313 \pm 0.0006$ 
&$0.9338 \pm 0.0010$ 
&$0.9356 \pm 0.0009$ 
\\ \hline
CoauthorCS & $0.9733 \pm 0.0007$ 
&$0.9733 \pm 0.0008$ 
&$\textbf{0.9765} \boldsymbol{\pm} \textbf{0.0005}$ 
&$0.9744 \pm 0.0005$ 
&$0.9733 \pm 0.0009$ 
&$0.9690 \pm 0.0007$ 
&$0.9712 \pm 0.0009$ 
&$0.9722 \pm 0.0005$ 
&$0.9722 \pm 0.0011$ 
&$0.9722 \pm 0.0007$ 
&$0.9744 \pm 0.0007$ 
\\ \hline
AmazonPhotos & $0.9605 \pm 0.0022$ 
&$0.9617 \pm 0.0007$ 
&$0.9629 \pm 0.0015$ 
&$0.9599 \pm 0.0013$ 
&$0.9641 \pm 0.0017$ 
&$0.9574 \pm 0.0018$ 
&$0.9641 \pm 0.0019$ 
&$0.9592 \pm 0.0133$ 
&$\textbf{0.9653} \boldsymbol{\pm} \textbf{0.0027}$ 
&$0.9635 \pm 0.0031$ 
&$0.9562 \pm 0.0019$ 
\\ \hline
Actor & $0.5982 \pm 0.0016$ 
&$0.5919 \pm 0.0022$ 
&$\textbf{0.6005} \boldsymbol{\pm} \textbf{0.0039}$ 
&$0.5959 \pm 0.0039$ 
&$0.5965 \pm 0.0038$ 
&$0.5970 \pm 0.0027$ 
&$0.5976 \pm 0.0037$ 
&$0.5993 \pm 0.0043$ 
&$0.5930 \pm 0.0041$ 
&$0.5970 \pm 0.0037$ 
&$0.5953 \pm 0.0031$ 
\\ \hline
Cornell & $0.7341 \pm 0.0097$ 
&$0.7364 \pm 0.0165$ 
&$0.7364 \pm 0.0073$ 
&$0.7205 \pm 0.0154$ 
&$0.7523 \pm 0.0109$ 
&$0.7795 \pm 0.0120$ 
&$0.7568 \pm 0.0188$ 
&$0.7500 \pm 0.0140$ 
&$0.7477 \pm 0.0138$ 
&$0.7909 \pm 0.0136$ 
&$\textbf{0.8000} \boldsymbol{\pm} \textbf{0.0423}$ 
\\ \hline
Wisconsin & $0.8688 \pm 0.0077$ 
&$\textbf{0.8922} \boldsymbol{\pm} \textbf{0.0035}$ 
&$0.8688 \pm 0.0080$ 
&$0.8906 \pm 0.0049$ 
&$0.8797 \pm 0.0044$ 
&$0.8578 \pm 0.0120$ 
&$0.8875 \pm 0.0037$ 
&$0.8781 \pm 0.0082$ 
&$0.8563 \pm 0.0128$ 
&$0.8750 \pm 0.0121$ 
&$0.8719 \pm 0.0076$
\\ \hline
\end{tabular}
}
\caption{Results on the proposed GCAN interpolation. Each column corresponds to one $\eta$ value. Each row corresponds to one dataset. Each entry shows the accuracy and the interval. We find in most cases, the optimal $\eta$ is neither $0$ (pure GCN) nor $1$ (pure GAT).}
\label{tab:gat_gcn_table}
\end{table*}

In Table \ref{tab:gat_gcn_table}, we show the mean accuracy across $30$ runs of each $\eta$ value and the $90\%$ confidence interval associated with each experiment. It is interesting to note that for various datasets we see varying optimal $\eta$ values for best performance. More often than not, the best model is interpolated between GCN and GAT, showing that we can achieve an improvement on both baselines simply by interpolating between the two. For example, GCN achieves the best accuracy among all interpolations in Citeseer, but in other datasets such as CIFAR 10 or Wisconsin, we see higher final accuracies when the $\eta$ parameter is closer to $1.0$ (more like GAT). The interpolation between the two points also does not increase or decrease monotonically for many of the datasets. The optimal $\eta$ value for each dataset can be any value between $0.0$ and $1.0$. This suggests that one should be able to learn the best $\eta$ parameter for each specific dataset. By learning the optimal $\eta$ value, we can outperform both GAT and GCN. 

To better evaluate the effectiveness of our GCAN architecture, we also show the performance of GCAN compared with GAT and GCN when the parameters and the tunable hyperparameter $\eta$ are both optimized using backpropagation, shown in \Cref{fig:gcan_backprop}. The experiment setup details are the same as the one used in \Cref{tab:gat_gcn_table}, described in \Cref{appendix:datasets}. 

\begin{figure}[ht]
    \centering
    \includegraphics[width=0.8\linewidth]{experiments/gcan_withwarmup.png}
    \caption{GCAN prediction accuracy on the validation (unlabeled) data points vs. number of iterations, trained using backpropagation. The figure of WikiCS and PubMed are plotted with iterations starting from 100 in order to show the details. }
    \label{fig:gcan_backprop}
\end{figure}

For all the six datasets, GCAN achieves an accuracy at least as high as the better one between GAT and GCN, which aligns with our expectations. In our experiment, we also occasionally observe a better performance on GCAN when $\eta$ is taking a value outside of our $[0,1]$ range. 

% \todo{MAYBE PUT THIS EXPERIMENT TO THE MAIN TEXT BECAUSE IT IS THE MULTIPLE PROBLEM INSTANCE SETTING:

% For each dataset, we draw 20 problem instances, where in each instance we let 20$\%$ of the data to be labeled and the others unlabeled. The rest experiment setup is the same as \ref{tab:gat_gcn_hyperparameter}. 


% }






% In Figure~\ref{fig:gat_gcn_complexity} we also show the fraction of the training data required to learn the best $\eta$ parameter and in turn achieve the best accuracy. As we increase the number of training problem instances, generally we see the unsupervised accuracy increases as expected. Some datasets that might be easier to learn converge to the optimal $\eta$ value with much fewer instances, while more complex graphs require more number of datapoints. The experiment in Figure~\ref{fig:gat_gcn_complexity} finds the best $\eta$ for each specific number of training instances and the average unsupervised accuracy is calculated over $10$ different draws from the test set. For the experiments on other datasets, refer to \Cref{appendix:datasets}.

%This creates the mean and confidence interval shown.
