\section{Introduction} \label{sec:intro}


Graph Neural Networks (GNNs) have significantly advanced graph data mining, demonstrating strong performance across various domains, including social platforms, e-commerce, transportation, bioinformatics, and healthcare \citep{hamilton2018inductiverepresentationlearninglarge,  kipf2017semisupervisedclassificationgraphconvolutional, wu2022graph, zhang2021graph}. In many real-world scenarios, graph data is inherently distributed due to the nature of data generation and collection processes \citep{zhou2020graph}. For example, data from social networks, healthcare systems, and financial institutions \citep{liu2019geniepath} is often generated by multiple independent entities, leading to fragmented and distributed graph structures. This distributed nature of graph data poses unique challenges when training GNNs, such as the need to address data privacy, ownership, and regulatory constraints \citep{zhang2021subgraph}.

Federated Learning (FL) emerges as a solution, allowing collaborative model training without centralized data sharing \citep{mcmahan2017communication, kairouz2021advances}. FL addresses data isolation issues and has been widely used in various applications, including computer vision and natural language processing \citep{li2020review}. However, applying FL to graph data introduces unique challenges, such as incomplete node neighborhoods and missing links across distributed subgraphs \citep{zhang2021subgraph}. These missing connections can degrade model performance and increase uncertainty, underscoring the need for robust uncertainty quantification techniques.


\begin{figure}[htbp]
        \includegraphics[width=\linewidth]{figures/overview_new2.png}
    \caption{\textbf{Overview of federated conformal prediction for graph-structured data.} A simplified scenario involving patient data distributed across three hospitals, highlighting both intra-client (solid lines) and inter-client (dashed lines) connections. In federated settings, inter-client links are often missing, despite their real-world presence, leading to fragmented subgraphs. The FedGNN model optimizes a global model through local updates on each client, while the centralized GNN model operates on the complete graph with all connections intact, serving as a performance benchmark. Missing inter-client links result in larger conformal prediction sets, as shown by \(C^1_{\alpha}(X_{\text{test}})\) (prediction set from FedGNN) and \(C^2_{\alpha}(X_{\text{test}})\) (prediction set from centralized GNN), illustrating how missing links affect model uncertainty.}
    \label{fig-overview}
\end{figure}

Conformal Prediction \citep{vovk2005algorithmic} offers a promising framework for producing statistically guaranteed uncertainty estimates, providing user-specified confidence levels to construct prediction sets with provable coverage guarantees. Specifically, with a miscoverage level \( \alpha \in (0, 1) \), CP uses calibration data to generate prediction sets for new instances, ensuring the true outcome is contained within them with probability at least \( 1 - \alpha \).

While CP has been explored in natural language processing \citep{kumar2023conformal}, computer vision \citep{angelopoulos2020uncertainty}, federated learning \citep{lu2023federated}, and GNNs \citep{zargarbashi2023conformal, huang2024uncertainty}, its application in federated graph learning remains underexplored. A primary challenge is ensuring the \emph{exchangeability} assumption, critical for CP's validity, holds in partitioned graph data, which may not be the case due to data heterogeneity across clients.

In this paper, we investigate conformal prediction within a federated graph learning framework, where multiple clients, each with distinct local data distributions \(P_k\) over node-feature-label pairs \((x, y)\), collaboratively train a shared global model while experiencing missing neighbor information. Our objective is to construct prediction sets with marginal coverage guarantees for unseen data drawn from a global test distribution \(Q_{\text{test}} = \sum_{k=1}^K p_k P_k\), where \(p_k\) denotes the mixing weight for client \(k\). However, heterogeneity across the client distributions \(P_k\) can violate the exchangeability assumption required by conformal prediction, undermining the validity of coverage guarantees and leading to larger, less informative prediction sets~\citep{huang2024uncertainty}. This issue is further compounded by the absence of inter-client links, which limits structural context and increases uncertainty due to incomplete neighborhood information, as illustrated in Figure~\ref{fig-overview}.

We extend the theoretical framework of {\it partial exchangeability} to graphs within the federated learning setting, addressing the challenges posed by data heterogeneity across the client subgraphs. Our analysis reveals inefficiencies in the size of the conformal prediction sets attributable to missing links. To counteract these inefficiencies, we introduce a novel framework designed to generate missing links across clients, thereby optimizing the size of CP sets.

Our main contributions are summarized as follows:

\begin{itemize}
\item We extend Conformal Prediction to federated graph settings, establish the necessary conditions for CP validity and derive theoretical statistical guarantees.
\item We analyze how the absence of inter-client links inflates conformal prediction set sizes and propose a method to mitigate this inefficiency through local subgraph completion.
\item We demonstrate the effectiveness of our approach through empirical evaluation on four benchmark datasets, showing improved efficiency of CP in federated graph scenarios.
\end{itemize}