\section{Related Works}

\paragraph{Self-supervised Learning for Graphs} The success of self-supervised contrastive learning in computer vision~\citep{oord2018representation, hjelm2018learning, grill2020bootstrap} inspired the development of contrastive learning methods for graph SSL based on mutual information maximization. For example, DGI~\citep{velickovic2019deep} maximizes mutual information between local patch representations and global graph representation by contrasting with negative examples from shuffled node features. GRACE~\citep{grace} maximizes the mutual information between node representations of two corrupted graph views by contrasting with intra- and inter-view negatives. BGRL~\citep{thakoor2021large} leverages BYOL~\citep{grill2020bootstrap} to perform contrastive learning without negative examples. InfoGCL~\citep{infogcl} proposes a contrastive framework to maintain task-relevant information at different levels and minimize the information loss during graph representation learning. MVGRL~\citep{mvgrl} uses graph diffusion to produce an alternative graph view and maximize the mutual information between the local representation of one view and the global representation of the other view. Graph SSL methods based on the reconstruction objective have also been proposed in the past. For instance, GATE~\citep{gate} uses stacked self-attention-based encoder/decoder architecture to reconstruct node features and graph structure. GraphMAE~\citep{graphmae} proposes to focus on feature reconstruction using a graph autoencoder. Recently, predictive graph SSL methods have also been proposed. LaGraph~\citep{lagraph} proposes to learn through predicting unobserved latent graphs. CCA-SSG~\citep{ccassg} leverages a feature prediction objective inspired by canonical correlation analysis. Contrastive learning methods for graph SSL without data augmentation have been proposed before, such as AF-GCL \citep{augmentationfreecont} and AFGRL \citep{augmentationfree}. However, both methods use nodes with the most similar representations as positive instances for contrastive learning, while our method leverages proximity measures for node representation learning.

\paragraph{Node Proximity} A variety of node proximity measures have been proposed in the past, including heat kernel~\citep{heat_chung}, PageRank~\citep{page1999pagerank}, Cycle Free Effective Conductance~\citep{cfec}, Katz~\citep{katz}, and SimRank~\citep{simrank}. Node proximity measures have been leveraged for learning on graph data in previous works. \citet{link_predict} leverages different node proximity measures for link prediction in social networks. \citet{link_predict2} augments graph proximity measures with existing weights in social networks for more accurate link prediction. \citet{proximity} computes structural and positional node embeddings using well-established proximity measures.

% The early work in graph self-supervised learning is inspired by the contrastive learning algorithms in computer vision~\citep{oord2018representation, hjelm2018learning, grill2020bootstrap}. DGI~\citep{velickovic2019deep} leverages the idea of deep InfoMax (DIM) to maximize the mutual information between patch representations and corresponding high-level summaries of graphs. Motivated by SimCLR~\citep{chen2020simple}, GRACE~\citep{zhu2020deep} generates two graph views by more complicated graph augmentations and learn node representations through maximizing the similarity of node representations in these two views. BGRL~\citep{thakoor2022largescale} borrows the idea of BYOL~\citep{grill2020bootstrap} and only build the positive sample pairs to learn node representations to address the scalability bottleneck. A more recent paper, CCA-SSG~\citep{zhang2021canonical}, also follows the similar idea of these methods to construct two views of graphs and optimize a feature-level objective. However, all these methods above rely on the multi-view of graphs and graph augmentations to build the positive/negative node pairs.

% \paragraph{Graph Diffusion} Graph diffusion has been used extensively in search engines for ranking web pages~\citep{page1999pagerank, heat_chung}. Diffusion has also been applied to graph learning in the past, but previous works focus on changing the architecture of GNNs using diffusion. For example, \cite{gasteiger_diffusion_2019} proposes graph diffusion convolution (GDC) that replaces the normalized adjacency matrix with diffusion matrix in GCN to expand receptive field. Personalized propagation of neural predictions (PPNP)~\citep{ppnp} is an improved propagation scheme based on personalized PageRank (PPR). PPRGo~\citep{bojchevski2020pprgo} leverages an efficient approximation to PPR to scale PPNP to large-scale graphs without compromising on accuracy.