\section{Introduction}
Semi-supervised learning is a powerful paradigm in machine learning which reduces the dependence on expensive and hard-to-obtain labeled data, by using a combination of labeled and unlabeled data. This has become increasingly relevant in the era of large language models, where an extremely large amount of labeled training data is needed. A large number of techniques have been proposed in the literature to exploit the structure of unlabeled data, including popularly used graph-based semi-supervised learning algorithms \citep{Blum2001LearningFL,zhu2003semi,zhou2003learning,delalleau2005efficient,chapelle2009semi}. More recently, there has been an increasing interest in developing effective neural network architectures for graph-based learning \citep{kipf2016semi,velivckovic2017graph,iscen2019label}. However, different algorithms, architectures, and values of hyperparameters perform well on different datasets \citep{dwivedi2023benchmarking}, and there is no principled way of selecting the best approach for the data at hand. In this work, we initiate the study of theoretically principled techniques for learning hyperparameters from infinitely large semi-supervised learning algorithm families.

In graph-based semi-supervised learning, the graph nodes consist of labeled and unlabeled data points, and the graph edges denote feature similarity between the nodes. There are several classical ways of defining a graph-based regularization objective that depend on the available and predicted labels as well as the graph structure. Optimizing this objective yields the predicted labels and the accuracy of the predictions depends on the chosen objective. The performance of the same objective may vary across datasets. By studying parameterized families of objectives, we can learn to design the objective that works best on a given domain-specific data. Similarly, modern deep learning based techniques often have several candidate architectures and choices for hyperparameters, often manually optimized for each application domain.
Recent work has considered the problem of learning the graph hyperparameter used in semi-supervised learning \citep{balcan2021data,fatemi2021slaps} but leaves the problem of selecting the hyperparameter wide open.
In this paper, we take important initial steps to build the theoretical foundations of algorithm hyperparameter selection in graph-based semi-supervised learning. 
%

Note that we focus specifically on algorithm hyperparameters, such as self-loop weights, leaving optimization hyperparameters like learning rates outside the scope of this study.\looseness-1


\subsection{Contributions}

\begin{itemize}[leftmargin=*]
    \item We study hyperparameter tuning in three canonical label propagation-based semi-supervised learning algorithms: the local and global consistency~\citep{zhou2003learning}, the smoothing-based~\citep{delalleau2005efficient}, and a novel normalized adjacency matrix-based algorithm. We prove new $O\left(\log n\right)$ pseudo-dimension upper bounds for all three families, where $n$ is the number of graph nodes. Our proofs rely on a unified template based on determinant evaluation and root-counting, which may be of independent interest.
    \item We provide matching $\Omega\left(\log n\right)$ pseudo-dimension lower bounds for all three aforementioned families. Our proof involves novel constructions of a class of partially labeled graphs that exhibit fundamental limitations in tuning the label propagation algorithms. We note that our lower bound proofs are particularly subtle and technically challenging, and involve the design of a carefully constructed set of problem instances and hyperparameter thresholds that shatter these instances.
    \item Next, we consider the modern graph neural networks (GNNs). We prove a new Rademacher complexity bound for tuning the weight of self-loops for a popular architecture proposed in \cite{wu19simplifying}, the Simplified Graph Networks (SGC). 
    \item We propose an architecture (GCAN) where a hyperparameter $\eta$ is introduced to interpolate two canonical GNN architectures: graph convolutional neural networks (GCNs) and graph attention neural networks (GATs). We bound the Rademacher complexity of tuning $\eta$. Because the parameter dimension is different, the Rademacher complexity of SGC and GCAN has different dependencies on the feature dimension $d$: $\sqrt{d}$ for SGC while $d$ for GCAN. 
    \item We conducted experiments to demonstrate the effectiveness of our hyperparameter selection framework.
\end{itemize}

\subsection{Related Work}
\paragraph{Graph Based Semi-supervised Learning} Semi-supervised learning is a popular machine learning paradigm with significant theoretical interest~\citep{zhou2003learning,delalleau2005efficient,Balcan2010ADM,garg2020generaliz}. Classical algorithms focus on label-propagation based techniques, such as \citet{zhou2003learning}, \citet{zhu2003semi}, and many more. 
In recent years, graph neural networks (GNNs) have become increasingly popular in a wide range of application domains
~\citep{kipf2016semi,velivckovic2017graph,iscen2019label}.
A large number of different architectures have been proposed, including graph convolution networks, graph attention networks, message passing, and so on \citep{dwivedi2023benchmarking}. 
Both label propagation-based algorithms and neural network-based algorithms are practically useful~\citep{Balcan2005PersonII,kipf2016semi}. For example, although GNN-based algorithms are more predominant in applications, \cite{huang2020combininglabelpropagationsimple} show that modifications to label propagation-based algorithms can outperform GNN. For node classification in GNN, many work study generalization guarantees for tuning network weights in GNNs ~\citep{oono2021optimizationgeneralizationanalysistransduction, esser2021learningtheorysometimesexplain, tang2023understandinggeneralizationgraphneural}. In contrast, we study the tuning of the \textit{hyperparameters} related to the GNN architecture. 

\paragraph{Hyperparameter Selection} Hyper-parameters, such as the weight for self-loop, play important roles in the performance of both classical methods and GNNs. 
In general, hyperparameter tuning is performed on a validation dataset, and follows the same procedure: determine which hyperparameters to tune and then search within their domain for the combination of parameter values with best performance \citep{yu2020hyperparameteroptimizationreviewalgorithms}. Many methods are proposed to efficiently search within the parameter space, such as grid search, random search \citep{JMLR:v13:bergstra12a}, and Bayesian optimization (\cite{Mockus1974bayesian}; \cite{Mockus1978application}; \cite{jones1998efficient}). A few existing works investigate the theoretical aspects of these methods, such as through generalization guarantees and complexities of the algorithms. 

A recently introduced paradigm called data-driven algorithm design is useful for obtaining formal guarantees for hyperparameter tuning~\citep{balcan2020data,sharma2024data}. In particular, \citet{balcan2024provablytuningelasticnetinstances,Balcan2023NewBF} study the regularization hyperparameter in the ElasticNet in statistical settings and \cite{balcan2024learning} study learning decision tree algorithms. For unsupervised learning, \citet{balcan2019datadrivenclusteringparameterizedlloyds,balcan2024algorithm} study a parameterized  family of clustering algorithms and study the sample and computational complexity of learning the parameters. For semi-supervised learning, a recent line of work (\cite{balcan2021data}; \cite{sharma2023efficiently}) considers the problem of learning the best graph hyperparameter from a set of problem instances drawn from a data distribution. Another recent work \citep{balcan2025samplecomplexitydatadriventuning} investigates the kernel hyperparameters in GNN architectures, and derives the generalization guarantees through pseudo-dimension. However, no existing work theoretically studies the tuning of the labeling \textit{algorithm hyperparameter} in semi-supervised learning, or investigates data-dependent bounds on hyperparameter selection in deep semi-supervised learning algorithms through Rademacher Complexity. We note that in this work we focus
on the statistical learning setting (i.e. the problem instances are drawn from a fixed, unknown distribution),
but it would be an interesting direction to study online tuning of the hyperparameters using tools from prior
work~\citep{Sharma2019LearningPL,Sharma2024NoIR,Sharma2025OfflinetoonlineHT}.