% \section{Additional Experiment Details}\label{appendix:datasets}

\subsection{Experiment Setup for GCAN}\label{appendix:datasets}
We apply dropout with a probability of $0.4$ for all learnable parameters, apply 1 head of the specialized attention layer (with new update rule), and then an out attention layer. The activation we choose is eLU activation (following prior work \citep{velivckovic2017graph}), with 8 hidden units, and 3 attention heads. 
% We start training with an initial learning rate of $7 \times 10^{-5}$ and a weight decay of $5 \times 10^{-4}$.

These GCAN interpolation experiments are all run with only $20\%$ of the dataset being labeled datapoints, and the remaining $80\%$ representing the unlabeled datapoints that we test our classification accuracy on. \Cref{tab:gat_gcn_hyperparameter} notes the exact setup of each dataset, and the overall training time of each experiment. We would like to examine our theory with the simplest network that is still non-linear, so we selected a hidden dimension being 1. Note that our theory on sample complexity bounds still applies to larger networks, but implementing our techniques on larger networks and larger graphs might require additional computational improvements.

\begin{table}[ht!]
\centering
\resizebox{\textwidth}{!}{%
\begin{tabular}{lcccccccc}
\toprule
Dataset     & Num of train nodes & learn rate & Epoch & Num of exp & Train time(sec) & Dim of hid. layers & Num of Attention Heads \\
\midrule
CiFAR10                        & 400                   & 7e-3          & 1000   & 30         & 13.5354               & 1                   & 3                 \\
WikiCS                        & 192                   & 7e-3          & 1000   & 30         & 6.4742               & 1                   & 3                 \\
Cora                        & 170                   & 7e-3          & 1000   & 30         & 7.4527               & 1                   & 3                 \\
Citeseer                        & 400                   & 7e-3          & 1000   & 30         & 6.4957               & 1                   & 3                 \\
Pubmed                        & 400                   & 7e-3          & 1000   & 30         & 13.1791               & 1                   & 3                 \\
CoAuthor CS       & 400                   & 0.01         & 1000   & 30         & 6.8015               & 1                   & 3                 \\

Amazon Photos       & 411                   & 0.01          & 400   & 30         & 11.0201              & 1                   & 3                \\
Actor       & 438                   & 0.01          & 1000   & 30         & 14.7753              & 1                   & 3                 \\
Cornell       & 10                   & 0.01          & 1000   & 30         & 6.9423              & 1                   & 3                 \\
Wisconsin       & 16                   & 0.01          & 1000   & 30         & 6.9271              & 1                   & 3                 \\
\end{tabular}%
}
\caption{Details of the datasets and experimental setup.}
\label{tab:gat_gcn_hyperparameter}
\end{table}

For datasets that are not inherently graph-structured (e.g., CIFAR-10), we first compute the Euclidean distance between the feature vectors of each pair of nodes. An edge is then added between two nodes if their distance is below a predefined threshold.

% In \Cref{fig:gat_gcn_plot}, we plot the accuracy of GCAN on different $\eta$ values, visualizing the result of \Cref{tab:gat_gcn_table}.

% \begin{figure*}[h! tbp]
%     \centering
%     \includegraphics[width=\textwidth]{lambda_graph_imgs/acc_vs_threshold.png}
%     \caption{Accuracies and confidence intervals of the proposed GCAN interpolation method on different datasets. ``Threshold'' on the $x$-axis refers to the $\eta$ hyperparameter in the GCAN interpolation method.
%     }
%     \label{fig:gat_gcn_plot}
% \end{figure*}

% In \Cref{fig:gat_gcn_plot_complexity}, we plot the classification accuracy as the fraction of training data increases. The conclusion is similar to that in \Cref{sec:experiments}.

% \begin{figure*}[h! tbp]
%     \centering
%     \includegraphics[width=\textwidth]{experiments/learning curves.png}
%     \caption{Accuracies and confidence intervals of the proposed GCAN interpolation method on different datasets as a function of fraction of total training set. For some datasets, we note that we can achieve high accuracies using only a fraction of the data, showing the robustness of our GCAN model.
%     }
%     \label{fig:gat_gcn_plot_complexity}
% \end{figure*}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% \subsection{Smoothing-Based Algorithm Family (\texorpdfstring{$\mathcal{F}_\lambda$}{F lambda})}

% In this section, we take the Smoothing-Based Algorithm Family ($\cF_\lambda$) as an example to show the necessity of tuning the hyperparameter in label propagation based algorithm families introduced in \Cref{sec:label_prop}.

% We take $100$ points from each dataset and label $20\%$ of the points, leaving the rest unsupervised. We find the unsupervised accuracy in each experiment and run it $30$ times to get a mean unsupervised accuracy and a $90\%$ confidence interval. Figure~\ref{fig:lambda_exp} shows the results of the unsupervised accuracy of binary classification on multiple datasets. 

% There can be a large amount of variation in the unsupervised accuracy as the hyperparameter value is varied, and accuracy does not increase or decrease monotonically. For some datasets such as Cora and Citeseer, we see that as the $\lambda$ parameter increases, the accuracy first drops and then increases dramatically. For CoAuthor CS and Actor, we see that the accuracy fluctuates but the mean is roughly constant across all $\lambda$ values. For PubMed, we see that lower values of $\lambda$ lead to higher accuracy. Therefore, we conclude that the different $\lambda$ hyperparameter value makes a large difference in the resulting classification accuracy, and the optimal value needs to be learned separately for each dataset.
%  % We also study the empirical sample complexity of tuning the parameter (Figure \ref{fig:new_lambda_experiment} in appendix).
%  \begin{figure*}[ht!]
%     \centering
%     \includegraphics[width=0.67\textwidth]{lambda_images/lambda_experiments.png}
%     \caption{Unsupervised Accuracy and a $90\%$ confidence interval plotted against various $\lambda$ values on a log scale.}
%     \label{fig:lambda_exp}
% \end{figure*}

% \subsection{Additional Experiments on GCN and GAT}

% \adcomment{i think we need to add more descriptions for this experiment}
% \begin{figure*}[h! tbp]
%     \centering
%     \includegraphics[width=0.8\textwidth]{experiments/GCN training curves.png}
%     \caption{Training and Validation accuracies over 2000 epochs on GCN models.}
%     \label{fig:GCN_training_curves}
% \end{figure*}

% \begin{figure*}[h! tbp]
%     \centering
%     \includegraphics[width=0.8\textwidth]{experiments/GAT training curves.png}
%     \caption{Training and Validation accuracies over 2000 epochs on GAT models.}
%     \label{fig:GAT_training_curves}
% \end{figure*}

% \begin{figure*}[h! tbp]
%     \centering
%     \includegraphics[width=0.66 \textwidth]{new_lambda_experiments.jpg}
%     \caption{Unsupervised Accuracy and confidence intervals vs. the number of subsamples required to learn the best $\lambda$ parameter on various datasets. For some datasets, we converge to the best $\lambda$ and thus the best unsupervised accuracy much more quickly (CIFAR10, Amazon Photos) while others such as Actor and CoAuthorCS require many more subsamples to learn the best $\lambda$ parameter.}
%     \label{fig:new_lambda_experiment}
% \end{figure*}

 
% \iffalse
% \section{Example Graphs}

% \begin{figure}[!htbp]
%     \centering
%     \begin{minipage}{0.45\textwidth}
%         \centering
%         \includegraphics[width=0.8\textwidth]{lambda_graph_imgs/lambda_graph.png}
%         \caption{Example Graph with Unlabeled Points}
%         \label{fig:lambda_exp1}
%     \end{minipage}
%     \hfill
%     \begin{minipage}{0.45\textwidth}
%         \centering
%         \includegraphics[width=0.8\textwidth]{lambda_graph_imgs/full_lambda_graph.png}
%         \caption{Same Example Graph with the true node labels on all nodes}
%         \label{fig:lambda_exp2}
%     \end{minipage}
%     \vfill
%     \begin{subfigure}{\textwidth}
%         \centering
%         \includegraphics[width=0.5\textwidth]{lambda_graph_imgs/lambda_acc_plot.png}
%         \caption{Accuracy on Unsupervised Nodes (Gray Nodes) on various $\lambda$ values.}
%         \label{fig:lambda_acc_plot}
%     \end{subfigure}
% \end{figure}

% \fi 


% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


% \section{Additional Experiment Details}\label{appendix:datasets}

% \begin{figure*}[h! tbp]
%     \centering
%     \includegraphics[width=0.8 \textwidth]{GCAN-plots.png}
%     \caption{Accuracies and confidence intervals of the proposed GCAN interpolation method on different datasets. ``Threshold" on the $x$-axis refers to the $\eta$ hyperparameter in the GCAN interpolation method.
%     }
%     \label{fig:gat_gcn_plot}
% \end{figure*}

% \begin{table}[h!]
% \centering
% \resizebox{\textwidth}{!}{%
% \begin{tabular}{lcccccccc}
% \toprule
% Dataset     & Num of training nodes & Learning rate & Epoch & Num of exp & Training time(sec) & Dim of hidden layers & Num of Attention Heads \\
% \midrule
% CiFAR10                        & 400                   & 7e-3          & 1000   & 30         & 13.5354               & 1                   & 1                 \\
% WikiCS                        & 192                   & 7e-3          & 1000   & 30         & 6.4742               & 1                   & 1                 \\
% Cora                        & 170                   & 7e-3          & 1000   & 30         & 7.4527               & 1                   & 1                 \\
% Citeseer                        & 400                   & 7e-3          & 1000   & 30         & 6.4957               & 1                   & 1                 \\
% Pubmed                        & 400                   & 7e-3          & 1000   & 30         & 13.1791               & 1                   & 1                 \\
% CoAuthor CS       & 400                   & 0.01         & 1000   & 30         & 6.8015               & 1                   & 1                 \\

% Amazon Photos       & 411                   & 0.01          & 400   & 30         & 11.0201              & 1                   & 1                 \\
% Actor       & 438                   & 0.01          & 1000   & 30         & 14.7753              & 1                   & 1                 \\
% Cornell       & 10                   & 0.01          & 1000   & 30         & 6.9423              & 1                   & 1                 \\
% Wisconsin       & 16                   & 0.01          & 1000   & 30         & 6.9271              & 1                   & 1                 \\
% \end{tabular}%
% }\label{tab:gat_gcn_hyperparameter}
% \end{table}
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


% % \subsection{Smoothing-Based Algorithm Family ($\mathcal{F}_\lambda$)}


% \begin{figure*}[h! tbp]
%     \centering
%     \includegraphics[width=0.66 \textwidth]{lambda_images/lambda_experiments.png}
%     \caption{Unsupervised Accuracy and a $90\%$ confidence interval plotted against various $\lambda$ values on a log scale.}
%     \label{fig:lambda_exp}
% \end{figure*}

% \begin{figure*}[h! tbp]
%     \centering
%     \includegraphics[width=0.66 \textwidth]{new_lambda_experiments.jpg}
%     \caption{Unsupervised Accuracy and confidence intervals vs. the number of subsamples required to learn the best $\lambda$ parameter on various datasets. For some datasets, we converge to the best $\lambda$ and thus the best unsupervised accuracy much more quickly (CIFAR10, Amazon Photos) while others such as Actor and CoAuthorCS require many more subsamples to learn the best $\lambda$ parameter.}
%     \label{fig:new_lambda_experiment}
% \end{figure*}