\section{Introduction}

Graph Neural Networks (GNN)~\citep{welling2016semi, hamilton2017inductive,chen2022bag,duan2022a} are neural network architectures that extract meaningful and useful representations out of graph data. GNNs have shown great potential in a wide range of applications, including social networks~\citep{fan2019graph, STGSN,liu2021exact}, recommendation systems~\citep{wu2020graph, chang2021sequential,chen2022tinykg}, and drug discovery~\citep{chen2018rise,xiong2019pushing,zhou2019auto}.

\paragraph{The Need for Self-Supervised Learning:}
Traditional supervised GNN training strategies require intensive data labeling, which is prohibitively expensive in important fields such as biochemistry~\citep{xiong2019pushing}.  As an alternative, Self-Supervised Learning (SSL) strategies do not rely on labels and have shown promising potential in graph learning.  Prior SSL approaches such as DGI~\citep{velickovic2019deep}, GRACE~\citep{grace}, BGRL~\citep{thakoor2021large} can learn meaningful representations that are useful in downstream tasks such as academic paper categorization, molecule classification, and product recommendation.

% \vspace{1em}

% GNNs learn high-quality representations of nodes, edges, or graphs by leveraging and aggregating node features, edge features, and adjacency information. 
% Traditional methods of training GNNs is costly, since they require a large amount of labels to achieve high accuracy in downstream tasks. In real-world scenarios, graphs are very large, such as those found in recommendation models and social networks, and data labeling is often prohibitively expensive. Furthermore, it is impractical or impossible to collect labels in fields such biochemistry, since it takes up to two weeks to obtain labels for generated molecules using the current simulation tools, and the costs of laboratory experiments are high~\citep{xiong2019pushing}. These evidences suggest that effective learning methods for graphs without relying on labels is of great significance.

% Grash neural networks (GNNs) have been widely implemented a bunch of fields such as social networks, molecules, and geographics~\citep{chanussot2021open, derrow2021eta, wieder2020compact}. To train a GNN with high prediction accuracy, it requires to learn effective representations. Traditionally, the representations are trained in a supervised way, which costs a large number of node labels. However, due to the high labeling cost, it is almost impossible to collect enough labels satisfying the requirement of supervised training. For example, when using GNNs to assist the drug design~\citep{xiong2019pushing}, it usually takes one to two weeks to evaluate property of generated molecules using the current simulation tools, not to mention the cost of the laboratory experiments. 

% GNNs typically learn graph representations in a supervised or
% semi-supervised setting. In practice, obtaining a large number of labels is often difficult or even impossible, especially in specific areas that are very costly, such as in biochemistry. The labeled graphs may be limited, while unlabeled graphs are easy to collect. Self-supervised learning utilizing unlabeled data has made significant progress in computer vision and shows
% great potential in exploring unlabeled data to enhance graph deep learning.

% \textbf{Summarize the two issues of graph self-supervise learning}

% Though current graph self-supervised learning algorithms have achieved a great success in the learning node representations, they still faces two significant constrains. First, most of the graph self-supervised learning algorithms leverage the

% Self-supervised learning (SSL) has shown promising potential of eliminating the need for labels in graph problems. Prior methods such as DGI~\citep{velickovic2019deep}, GRACE~\citep{Zhu:2020vf}, BGRL~\citep{thakoor2022largescale} rely on contrasting two or more corrupted views of the graph to learn useful representations for graph data, and prove effectiveness in some datasets. However, prior SSL approaches for graphs largely suffer two problems: the over-reliance on unnatural and sometimes unreliable graph corruptions, and the memory and computational overhead as a result of computing multiple graph views. 

\paragraph{Problems of Existing Graph SSL Approaches:} In this paper, we identify two problems in the current graph SSL approaches. First, prior competitive SSL approaches for graphs rely on corruption techniques, which perturb node attributes or the adjacency matrix. The corruption techniques are inspired by data augmentation tricks from the computer vision~\citep{imageaugment}. However, unlike images, corrupted graphs may not maintain the original semantics at the node level or graph level.  As a result, the encoder may not be able to learn meaningful representations because the learning goal is flawed. Second, existing graph SSL approaches need to compute multiple views of the graph, which increases the memory and computation complexity during training. This efficiency issue would be exacerbated when we train on large graphs with a limited memory budget.

% In addition to over-reliance on corruption functions, prior competitive self-supervised graph learning approaches require the computation of multiple views of the same graph, from which they mine positive and negative examples for contrastive learning. Modern hardware used for GNN training such as GPU has limited amount of memory, and the computation of multiple views scale poorly to large graphs. Compared to supervised training which only computes a single view of the graph, prior self-supervised methods consume multiple times more memory and computation time which poses a scalability problem. This is problematic in many real-world problems since common citation, co-purchasing, and social network graphs contain millions of nodes and edges. Although sub-sampling techniques exist to fit multiple views of the graph in a limited memory budget, they have been demonstrated to hurt performance significantly  \cite{thakoor2022largescale}. It is ideal for self-supervised graph learning to compute only a single view of the graph, in order to scale to larger graphs and avoid losing performance to more sub-sampling.

% \paragraph{Unnatural Corruption}


% \paragraph{High Computation Overhead}


% \paragraph{Our contribution}
% \begin{itemize}
%     \item corruption free
%     \item easy achieve enough positive/negative node pairs
%     \item scalable to large graph training
% \end{itemize}

% \textbf{Describe our motivation} leverage diffusion to address the issue of false positive node pairs (need to define false positive node pairs). diffusion could mitigate the hetereophily of the original (normalized) adjacency matrix. Provide evidence...
% place the figure here...

% \textbf{Highlight our experiment results} Show Number here 1) state-of-the-art performance 2) better performance than supervised learning 3) scalable to large graph training.
\vspace{1em}

Given the limitation of current graph SSL approaches, a natural question arises:
\begin{center}
{\it
Can we have a corruption-free single-view approach for graph SSL with promising performance? 
}
\end{center}

In this paper, we answer this question positively by proposing Proximity Divergence Minimization ({\method}), a corruption-free single-view graph SSL approach. In particular, we summarize our contributions as:


\begin{enumerate}
    \item We propose a novel graph SSL framework by leveraging node proximity as the learning target for node representation similarity from a single uncorrupted graph view. Without corruption, our proposed method is a natural and more informative learning objective that achieves significantly better accuracy with minimal tuning. Using only a single view, our approach is much more memory-efficient than previous methods and able to scale to large graphs that are impractical for multi-view methods.
    \item We extend our approach to easily scale {\method} to large-scale graphs that leverages recent advances in efficient graph training. We scale SSL to large-scale graphs that are difficult to tune and time-consuming to train for the existing SSL methods.
    \item We demonstrate the effectiveness of {\method} by achieving state-of-the-art accuracy on a variety of real-world graph datasets. We highlight that our approach achieves 2.6\%, 4.55\% and 3.04\% absolute accuracy improvement on PubMed, ogbn-proteins and ogbn-products respectively compared to the previous best. 
\end{enumerate}

The following sections are organized as follows. We introduce the graph SSL problem and existing SSL methods' two major drawbacks in Section~\ref{sec:graph_ssl}. The motivation behind our approach and the details of our proposed method are included in Section~\ref{sec:cursive}. We report the setup and results of our extensive experiments for evaluating our method in Section~\ref{sec:exp}. 