% \vspace{-1mm}
\section{Related Work} 


\vspace{2mm}
\noindent\textbf{Graph self-supervised learning.}
% Self-supervised contrastive learning methods learn representations of data points by maximizing the mutual information between different views of the same data point, and minimizing agreement between differently augmented views of different examples \citep{bachman2019learning, ye2019unsupervised,wu2018unsupervised,chen2020simple,sohn2020fixmatch}. 

Graph self-supervised
learning methods have become a powerful tool for learn-
ing representations without any labels, while contrastive
learning is the most successful and popular model structure.
DGI \citep{velickovic2019deep} and GMI \citep{peng2020graph} contrast graph and node representations within one augmented view of the original graph. More recent methods contrast global and local representations in two augmented views. \textsc{GraphCL} \citep{you2020graph} generates graph augmentations by subgraph sampling, node dropping, and edge perturbation and contrasts the augmented graph representations. GCC samples and contrasts subgraphs of the original graph \citep{qiu2020gcc}. MVGRL leverages node diffusion to augment the graph and contrasts the node representations \citep{hassani2020contrastive}. 
Contrasting the local node representations has been shown to achieve state-of-the-art. \textsc{GRACE} contrasts the node representations in two graph views augmented with feature masking and edge removal \citep{zhu2020deep}.
GCA extends this by dropping the less important edges and features, based on node centrality and feature importance metrics
\citep{zhu2021graph}. A thorough empirical study on the combinatorial effect of different augmentations has been conducted by \cite{zhu2021empirical}. 
Due to the complexity of collecting negative samples in graph data, negative-samples-free 
contrastive objectives have been also studied. Among existing methods, BGRL that uses the Bootstrapping Latent loss \citep{thakoor2021large}, and GBT uses Barlow Twins loss \citep{bielak2021graph}. 
Recently, \cite{liu2022revisiting} theoretically proved that under homophily, the learned representations by Graph CL essentially encode the low-frequency information via frequency invariance. They additionally proposed SpCo, a general GCL framework that finds the optimal contrastive graph views. 
Existing graph CL methods explicitly augment the input graph and contrast the augmented graph representations obtained with {low-pass} GNN-based encoders. In doing so, they only capture the similarity of nodes in a neighborhood. Hence, they perform poorly on graphs with heterophily. To address this, recently HGRL, a graph self-supervised learning method,  \cite{chen2022towards} proposed rewiring the entire graph first to drop edges connecting nodes in different classes and add edges connecting nodes in the same class.
Notably, instead of using a GNN encoder, HGRL leverages an MLP to avoid low-pass aggregation on edges connecting different classes. It also learns different weights on edges in the multi-hop neighborhood to capture more information in the graph. 
% \hy{\citep{he2023contrastive} introduces NeCo, a GCL method that constructs a homophily neighborhood by evaluating the sum of the similarities between an anchor node and its neighbors.  NeCo then samples positive pairs from this updated homophily neighborhood for contrastive learning.} 
SP-GCL \cite{wang2022can} proposed using nodes from the T-hop neighborhood of a node with high feature similarities as positive pairs, without using any explicit augmentations.
In contrast, we leverage low-pass and high-pass graph filters in the same GNN-based encoder to 
capture and contrast similarity and dissimilarity of nodes with their neighborhood.
This allows achieving state-of-the-art under heterophily. 

\vspace{2mm}
\noindent\textbf{Graph (semi-)supervised learning under heterophily.}
%To address over-smoothing issue of GNNs, 
In the supervised setting,
recent methods propose to use other types of aggregation that better fit graphs with heterophily. \cite{zhu2021interpreting} analyzed and designed a uniform framework for GNNs propagations and proposed GNN-LF and GNN-HF that preserve information of different frequency separately by using different filtering kernels with learnable weights.
FAGCN  \citep{bo2021beyond} and FBGNN \citep{luan2020complete} train two \textit{separate} encoders to capture the high-pass and low-pass graph signals separately. Then they rely on labels to learn relatively complex  mechanisms to combine the outputs of the encoders. 
However, learning how to combine the encoder outputs is highly sensitive to having high-quality labels. This makes such methods highly impractical for unsupervised contrastive learning, where the label information is not available.
Unlike the above supervised methods, we apply the high-pass and low-pass filters to different subgraphs, contrasting the resulting high-pass filtered node views and low-pass filtered node views in a self-supervised manner, without any label. This is in contrast to learning the best combination of filtered signals of different encoders based on labels.