% \vspace{-3mm}
\begin{figure}
    \centering
    \begin{subfigure}{0.45\columnwidth}
        \centering
        \includegraphics[width=\textwidth]{Fig/chameleon_OMG.pdf}
        \caption{Homophily neighborhood}
    \end{subfigure}
    \hspace{0.5cm} % Adjusted space between subfigures
    \begin{subfigure}{0.45\columnwidth}
        \centering
        \includegraphics[width=\textwidth]{Fig/chameleon_OTG.pdf}
        \caption{Heterophily neighborhood}
    \end{subfigure}
    \caption{Chameleon ($\beta\!\!=\!\!0.23$). Heterophilic graphs contain neighborhoods with homogeneous \& heterogeneous labels. \looseness=-1}
    \label{fig:mix_graph}
    \vspace{-3mm}
\end{figure}

\vspace{-4mm}
\section{Graph CL under Heterophily}
% \vspace{-4mm}
In this section, we first discuss the challenges of having a universal method for graph CL under heterophily and homophily. 
Then, we present our approach to overcome these challenges and learn high-quality representations.

\noindent\textbf{Challenges.}
% Designing a universal method that can learn high-quality representations under both homophilic and heterophily without labels, is very challenging. 
Under heterophily, where nodes in a neighborhood may have different labels, aggregating the node representations in a neighborhood fades out the dissimilarity between representations of node in different classes, and contrasting those augmented representations further makes them indistinguishable. Labels can help guide an appropriate aggregation in the neighborhood. However, without labels, it is not clear how the neighborhood information should be aggregated. Additionally, even if one can identify homophilic edges, the number of such edges may be too small to learn high quality representations via GNNs, under heterophily. To achieve rich representations in such graphs, it is crucial to not only aggregate representations of neighbors with the same label, but also push away representations of neighbors with different labels.
This allows learning richer node representations based on both similarities and dissimilarities of the nodes in different neighborhoods.

Next, we %discuss how we overcome the above challenges.
present our method, \alg, that can learn high-quality representations under heterophily.


\subsection{High-pass \& Low-pass graph CL (\alg)}
As discussed, under heterophily, leveraging node feature similarities is not enough for learning high-quality representations. It is crucial to capture the \textit{dissimilarities} between the neighboring nodes to separate different classes. A high-pass filter like the Laplacian matrix (\textit{c.f.} Sec. \ref{sec:filter}) filters the non-smooth graph component and captures the dissimilarity of the node features in a neighborhood. However, without labels, we cannot know whether the graph is homophilic or heterophilic, and naively using a high-pass filter instead of a low-pass filter significantly harms the performance under homophily. Moreover, most heterophilic graphs also consist of several neighborhoods with homogeneous labels, as illustrated in Fig.\ref{fig:mix_graph}. Hence, simply applying a high-pass filter to an unlabeled graph may result in poor performance. 

\textbf{Idea.} To learn rich node representations for both graph types, our main idea is to first %select edge subsets from the original graph to make homophily-subgraph and heterophilic subgraph. 
identify a homophilic subgraph and a heterophilic subgraph in the original graph.
Then, %for each subgraph, we generate two graph views by randomly corrupting the original subgraph \ba{corrupting nodes? edges?}. We
we augment each subgraph, and apply a low-pass filter to the augmented homophilic subgraphs and a high-pass filter to the augmented heterophilic subgraphs to obtain two high-pass and two low-pass filtered views for every node, using the \textit{same encoder}. 
%learn a high-pass and low-pass filtered view for every node by applying high-pass filter and low-pass filter to the heterophily and homophilic subgraph. 
The final representations are learned by contrasting the two high-pass filtered views and the two low-pass filtered views of every node.\looseness=-1

Next, we introduce our method, \alg, %as we will discuss in details next.
which works 
based on the above idea.%\looseness=-1

% The final representations are learned by contrasting the high-pass and low-pass filtered views of every node, as we will discuss in details next.

\noindent\textbf{Separating Subgraphs.}
\alg first identifies two subgraphs in the original graph: a homophilic subgraph with edges connecting nodes with homogeneous labels, and a heterophilic subgraph with edges connecting nodes with heterogeneous labels.

\noindent Formally, given a graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$, the heterophilic subgraph $\mathcal{G}^{het}=(\mathcal{V},\mathcal{E}^{het})$ and the homophilic subgraph $\mathcal{G}^{hom}=(\mathcal{V},\mathcal{E}^{hom})$ each contain all the nodes $\mathcal{V}$, and a subset of the edges of the original graph, i.e., $\mathcal{E}^{het}, \mathcal{E}^{hom}\subseteq \mathcal{E}$. We denote by $\pmb{A}^{het},\pmb{A}^{hom}\in\{0,1\}^{N\times N}$ the symmetric adjacency matrix of subgraphs $\mathcal{G}^{het}, \mathcal{G}^{het}$, respectively. %That is, $\pmb{A}_{ij}^{het} = 1$ if and only if $(v_i,v_j) \in \mathcal{E}^{het}$, and $\pmb{A}_{ij}^{het} = 0$ otherwise. 
Note that the feature matrix $\pmb{X}$ for $\mathcal{G}$ is the same as $\mathcal{G}^{het}$ and $\mathcal{G}^{hom}$.
However, the neighborhood for a given node $i$ can be different in the two subgraphs. We define %the neighborhood of node $i$ in $\mathcal{G^{AOS}}$ as 
$\mathcal{N}^{het}_i=\{j: \pmb{A}^{het}_{ij}=1\}$ and $\mathcal{N}^{hom}_i=\{j: \pmb{A}^{hom}_{ij}=1\}$ as the neighborhood of node $i$ in $\mathcal{G}^{het}$, $\mathcal{G}^{hom}$, respectively. 
% \ul{Similarly, all other graph components associated with the homophilic subgraph are denoted using the superscript 'hom', while those pertaining to the heterophilic subgraph are indicated with the superscript 'het'.}
% \noindent 
Without any label supervision, we rely on the important observation that for graphs with different homophily ratios, the original features can approximately indicate the label information \citep{jin2021universal,chen2022towards,wang2020gcn,zhu2020beyond}. Based on this observation, we calculate pairwise feature similarities $s_{ij}=\left<\pmb{x}_{i.},\pmb{x}_{j.}\right>$ for all $i,j\in [n]=|\mathcal{V}|$, where $\left<.,.\right>$ is the cosine similarity. Then, we first form the homophilic subgraph by selecting $k_1$ fraction of edges in neighborhood of every node $i$ with largest cosine similarities. 
Formally, $\mathcal{E}^{hom}\!=\!\big\{ (i,j)| i \!\in\! [n], j \!\in\! {\arg\max}_{P\subseteq \mathcal{N}_i, |S|=\lceil k_1\cdot|\mathcal{N}_i|\rceil} \sum_{p\in P} \{s_{i,p}\}\big\}$. 
Next, we form the heterophilic subgraph using $k_2$ fraction of the edges in neighborhood of every node with lowest cosine similarities, i.e., $\mathcal{E}^{het}\!=\!\big\{ (i,j)| i\!\in\![n], j \!\in\! {\arg\min}_{P\subseteq \mathcal{N}_i, |S|=\lceil k_2\cdot|\mathcal{N}_i|\rceil} \sum_{p\in P} \{s_{i,p}\}\big\}$.
% We form the heterophilic subgraph with the remaining edges, i.e., $\mathcal{E}^{het}=\mathcal{E}\setminus \mathcal{E}^{hom}$.
% $\mathcal{E}^{hom}=\{ (i,j)| i \in [n], j \in \mathcal{N}_i, j \in top k\% \{s_{i,j}\}\}$.
%
%
% Formally, we define $\mathcal{G^{\mathcal{AOS}}}=(\mathcal{V}, \mathcal{E}^{low})$ and  $\mathcal{G^{ATS}}=(\mathcal{V}, \mathcal{E}^{high})$, where $\mathcal{E}^{low}$ are the edges connecting the nodes of the same class and $\mathcal{E}^{high}$ are the edges connecting nodes of different classes, and $\mathcal{E} = \mathcal{E}^{low} \cup \mathcal{E}^{high}$. We denote the adjacency matrices for the subgraphs as $\pmb{A}^{\mathcal{AOS}}$ and $\pmb{A}^{\mathcal{ATS}}$, where $\pmb{A}^{\mathcal{AOS}}_{ij} = 1 $ if and only if $(v_i,v_j) \in \mathcal{E}^{low}$, and $\pmb{A}^{\mathcal{ATS}}_{ij} = 1 $ if and only if $(v_i,v_j) \in \mathcal{E}^{high}$. Similarly, we define the neighborhood of node $i$ in $\mathcal{G^{AOS}}$ as $\mathcal{N}^{\mathcal{AOS}}_i=\{j: \pmb{A}^{\mathcal{AOS}}_{ij}=1\}$ and the neighborhood in $\mathcal{G^{ATS}}$ as $\mathcal{N}^{\mathcal{ATS}}_i=\{j: \pmb{A}^{\mathcal{ATS}}_{ij}=1\}$ Note that both $\mathcal{G^{AOS}}$ and $\mathcal{G^{\mathcal{AOS}}}$ retain all the nodes in $\mathcal{G}$, so $\pmb{X}^{\mathcal{ATS}} = \pmb{X}^{\mathcal{AOS}} = \pmb{X}$ \vspace{2mm}
%
% $\mathcal{N}_i=\{j: \pmb{A}_{ij}=1\}$ is the neighborhood of node $i$
%
% As discussed before, high-pass filter is beneficial when applied to heterophilious neighborhoods, and low-pass filter is beneficial when applied to homophilious neighborhoods. Thus, the goal of the division is to divide the input graph $\mathcal{G}$ such that we can obtain $\mathcal{G^{AOS}}$ with more intra-class connections, and $\mathcal{G^{ATS}}$ with more inter-class connections. To do so, we rely on the important assumption that, for graphs with different homophily ratios, the original features may indicate the label information \cite{jin2021universal}. Specifically, we assign the edges in $\mathcal{E}$ via pair-wise similarity of node representations. For each node, we assign the top $k\%$ of the edges to $\mathcal{G^{AOS}}$, while the rest to $\mathcal{G^{ATS}}$. 
%

The initial subgraphs are constructed via original node features. However, the subgraphs are updated every $T$ epochs with the learned node representations during training. Note that, in contrast to prior work~\citep{chen2022towards}, we do not introduce new edges based on feature similarities throughout the training, which could change the semantic information of the graph \citep{he2023contrastive}.

% \hy{For implementation, to sample homophilic and heterophilic subgraphs, we use a two-step process: First, for each node, we select the top ceil($k_1$ fraction of edges) with highest cosine similarity to be in the homophilic subgraph. Then, we select the top ceil($k_2$ fraction of edges) with lowest cosine similarity for the heterophilic subgraph.} \ba{add to pseudocode (Alg 1)}

We note that %both subgraphs %with a small fraction of edges may become disconnected compared to 
all the nodes may not be connected in both subgraphs. However, as long as one subgraph is mostly connected, the information can be aggregated effectively and a satisfactory performance is obtained. 
For example, under homophily the heterophilic subgraph is small, but the homophilic subgraph contains almost all the nodes in the largest connected component of the original graph. 
Similarly, under extreme heterophily almost all the nodes are in the heterophilic subgraph and the homophilic subgraph is small and minimally affects the performance.
%%%%%%%%%
As \alg contrasts augmented views of the homophilic subgraph and heterophilic separately (it does not contrast the subgraphs with each other), only one subgraph needs to be mostly connected to achieve satisfactory performance.
In our ablation stuides in Sec. \ref{sec:connectivity}, we confirm that in real-world graphs at least one of the subgraphs are mostly connected.
%%%%%%%%%
% Hence, the contrastive loss applied to the homophilic subgraph mostly determines the final results, and the second contrastive loss minimally affects performance. 





\textbf{Augmenting the Subgraphs.}
\label{sec:intro_aug}
Next, \alg generates two augmented views for each subgraph via random graph perturbations. We denote the two augmented graph views as $\mathcal{G}$ and $\tilde{\mathcal{G}}$. 
% \ul{To maintain consistency, all related graph components in the augmented view are similarly denoted with a tilde accent mark.}
For the homophilic subgraph, we follow \citep{zhu2020deep} and apply edge removal and feature masks as our graph augmentations. For the heterophilic subgraph, we apply node dropping and feature masks as our augmentations. We study the effects of different augmentation techniques on the heterophilic subgraph in Sec. \ref{sec:augmentation}. \looseness=-1

\noindent\textbf{Producing the Filtered Representations.}
% Having constructed $\mathcal{G}^{het}$ and $\mathcal{G}^{hom}$, 
Subsequently, \alg applies a high-pass filter to the two augmented views of the heterophilic subgraph, %containing nodes with different %features and labels, 
and a low-pass filter to the two augmented views of the homophilic subgraph, using the \textit{same encoder}. %By contrastive the augmented views of each subgraph, \alg learns high-quality representations. 
%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%
% The augmented views are then projected via 
% a 2-layer non-linear MLP, named projection head, %maps augmented node views 
% to another latent space where the contrastive losses are calculated  
% % We use a two-layer MLP with non-linear activations as the projection head and discard it after training, 
% % as advocated in 
% % that is attached to the model during training to project the representations into a space where contrastive loss is applied. 
% % After training, the projection head is discarded and a linear model is trained on the pre-projection representations. Projection head is a standard technique used in almost all contrastive learning methods on images
% \citep{chen2020simple,chen2021exploring, %and graphs 
% zhu2020deep,zhu2021empirical}. %containing nodes with similar %features and labels. 
% % In doing so, \alg generates distinguishable views for nodes with dissimilar features and labels in a neighborhood and similar views for nodes with similar features and labels in a neighborhood.
% Finally, \alg\ learns high-quality node representations by contrasting the low-pass augmented views
% %filtered views 
% of every node and contrasting the high-pass augmented views of every node, generated by the \textit{same encoder}. 
%
The shared encoder is crucial to ensure a good performance under both homophily and heterophily. %., as we will also confirm empirically.
% \hy{Note that, as shown in theorem\ref{the:major}, maintaining the invariance of both high-frequency and low-frequency components in the graph representation is crucial. Therefore, it is important to use a shared encoder to ensure that it is optimized by both high-pass and low-pass losses. Empirically, we have also observed that a shared encoder has better performance. Additionally, we want to train a single model that can process both homophilic and heterophilic graphs, rather than multiple models. }
% with the augmented alternative low-pass view and the high-pass filtered views of every node with the augmented alternative high-pass view. 

Specifically, to generate the low-pass and high-pass filtered node views, \alg leverages the renormalized adjacency matrices of the augmented heterophilic subgraph and the renormalized Laplacian matrices of the augmented heterophilic subgraph. %as low-pass and high-pass filters, 
% respectively. 
%%%%%%%%%%%%%%
% We denote by $\pmb{F}_{LP}$, $\tilde{\pmb{F}}_{LP}$ and $\pmb{F}_{HP}$, $\tilde{\pmb{F}}_{HP}$ the low-pass and high-pass filters corresponding to the two augmented views of the homophilic and heterophilic subgraphs, respectively.
%%%%%%%%%%%%%%
Formally, %$\pmb{F}_{LP}=\pmb{\hat{A}}_{sym}^{hom} = \pmb{\bar{D}}^{hom^{-\frac{1}{2}}}\pmb{\bar{A}}^{hom} \pmb{\bar{D}}^{hom^{-\frac{1}{2}}}$, 
$\pmb{F}_{LP}=\pmb{\hat{A}}_{sym}^{hom}$,
and 
% To generate the high-pass filtered node views, \alg leverages the renormalized Laplacian matrix of the augmented heterophilic subgraph
$\pmb{F}_{HP}=\pmb{\hat{L}}_{sym}^{het} = \pmb{I} - \pmb{\hat{A}}_{sym}^{hom}$, %and  $\tilde{\pmb{F}}_{LP}, \tilde{\pmb{F}}_{HP}$ 
are the low-pass and high-pass filters corresponding to the first augmented view of the homophilic and heterophilic subgraphs, and $\tilde{\pmb{F}}_{LP}$, $\tilde{\pmb{F}}_{HP}$ are the low-pass and high-pass filters corresponding to the second augmented view of the homophilic and heterophilic subgraphs.
Effectively, $\pmb{F}_{LP}, \tilde{\pmb{F}}_{LP}$
are the aggregation operations in Eq. \eqref{eq:operations_A} and $\pmb{F}_{HP}, \tilde{\pmb{F}}_{HP}$ are diversification operations in Eq. %\ref{eq:operations_A}),
\eqref{eq:operations_L}. Then, the two low-pass filtered views of the homophilics subgraph are obtained as follows:
% To generate the high-pass filtered node views, \alg leverages the normalized Laplacian matrix of the augmented heterophilic subgraph $\pmb{F}_{HP}=\pmb{\hat{L}}_{sym}^{het} = $ and normalized adjacency matrix of the augmented homophilic subgraph $\pmb{F}_{LP}=\pmb{\hat{A}}_{sym}^{hom} = \pmb{\bar{D}}^{-\frac{1}{2}}\pmb{\bar{A}} \pmb{\bar{D}}^{-\frac{1}{2}}$, as the diversification and aggregation operations in Eq. (\ref{eq:operations_A}), (\ref{eq:operations_L}). That is, the high-pass and low-pass filtered views are obtained as follows:
% We note that other types of high-pass and low-pass filters can %be used instead of $\pmb{\hat{L}}_{sym}$ and $\pmb{\hat{A}}_{sym}$
% be used in a similar way in our framework.
%More specifically, we input the subgraphs $\mathcal{G^{AOS}}$ and $\mathcal{G^{ATS}}$, \ba{we need new notations for the adjacency and Laplacian of subgraphs} obtained via local graph separation, and input them into a graph encoder. By applying the high-pass filter $\pmb{F}_{HP}=\pmb{\hat{L}}_{sym}$ to $\mathcal{G^{ATS}}$ and low-pass filter $\pmb{F}_{LP}=\pmb{\hat{A}}_{sym}$ to $\mathcal{G^{AOS}}$, we obtain the high-pass node representations $\pmb{H}_H$ and low-pass node representations $\pmb{H}_L$ as follows: 
\begin{align}
\pmb{H}_{L}^l &= \sigma(\pmb{F}_{LP} \pmb{H}^{l-1}_{H} \pmb{W}^{l-1}_{}), \label{eq:low}\\
\tilde{\pmb{H}}_{L}^l &= \sigma(\tilde{\pmb{F}}_{LP} \pmb{H}^{l-1}_{H} \pmb{W}^{l-1}_{}), \label{eq:low_aug}
% \\ \pmb{H}_{L}^l &= \sigma(\pmb{F}_{LP} \pmb{H}^{l-1}_L \pmb{W}^{l-1}_{}),
% \\ &\pmb{H}^0_L=\pmb{H}^0_H=\pmb{X}.
\end{align}
and the two high-pass filtered views of the heterophilic subgraph are obtained as follows:
\begin{align}
\pmb{H}_{H}^l &= \sigma(\pmb{F}_{HP} \pmb{H}^{l-1}_{H} \pmb{W}^{l-1}_{}), \label{eq:high}\\ 
\tilde{\pmb{H}}_{H}^l &= \sigma(\tilde{\pmb{F}}_{HP} \pmb{H}^{l-1}_L \pmb{W}^{l-1}_{}).\label{eq:high_aug}
% \\ &\pmb{H}^0_L=\pmb{H}^0_H=\pmb{X}.
\end{align}
% where %$\pmb{\hat{A}}_{sym},\pmb{\hat{L}}_{sym}$ are the low-pass and high-pass filters,
$\pmb{H}^l_{L}, \tilde{\pmb{H}}^l_{L}$ are the low-pass filtered augmented views at layer $l$ of the encoder, $\pmb{H}^l_{L}, \tilde{\pmb{H}}^l_{L}$ are the high-pass filtered augmented views at layer $l$ of the encoder,  $\pmb{W}^l\in\mathbb{R}^{d_l \times d_{l-1}}$ is the weight matrix in layer $l$ of the encoder, $\sigma$ is the activation function, and we have $\pmb{H}^0_L=\pmb{H}^0_H=\pmb{X}$, and $\tilde{\pmb{H}}^0_L=\tilde{\pmb{H}}^0_H=\tilde{\pmb{X}}$ where $\pmb{X}, \tilde{\pmb{X}}$ are augmented feature matrices. \looseness=-1 

$\pmb{F}_{HP}, \tilde{\pmb{F}}_{HP}$ filter out the low-frequency signals and preserve the high-frequency signals. In doing so, they capture the difference in feature of each node and its neighbors.
Using a high-pass encoder within a multi-layer encoder iteratively captures the difference between features of the nodes in a multi-hop neighborhood of a node in the heterophilic subgraph. Hence, it makes the representations of nodes that have different features from their neighbors distinct in their multi-hop neighborhood. 
On the other hand, $\pmb{F}_{LP}, \tilde{\pmb{F}}_{LP}$, 
only preserve the low-frequency signals by aggregating every node's features with those of its immediate neighborhood. 
Using the low-pass filter within a multi-layer graph encoder iteratively aggregates features in a multi-hop neighborhood of every node in the homophilic subgraph to learn its representation. 
Hence, they smooth out the node representations and produces similar representations for the nodes within the same multi-hop neighborhood.

Note that, we use the Laplacian and adjacency matrices of the augmented subgraphs instead of those of the original graphs, as they indicate how the information in different neighborhoods should be aggregated by the GCN encoder. Indeed, it is important to use the corresponding matrices in the subgraphs. In doing so, we pull together representations of nodes within label-homogeneous neighborhoods by applying low-pass filters to homophilic subgraphs, and push away representations of nodes within label-heterogeneous neighborhoods by applying high-pass filter to heterophilic subgraphs. If both filters are applied to the original graph, representations of the nodes within each neighborhood will be pulled together and pushed apart at the same time.

Using both high-pass and low-pass filters provide complementary information and allow learning both smooth and non-smooth components of the graphs simultaneously, which is particularly useful for graphs under heterophily. We note that other types of high-pass and low-pass filters can be used in a similar way in our framework.

% \noindent\textbf{Contrastive loss.} 
\noindent\textbf{Contrasting the Filtered Representations.}
Finally, by contrastive the augmented views of each subgraph, \alg learns high-quality representations. 
%%%%%%%%%%%%%%%%%%
The augmented views $\pmb{H}, \tilde{\pmb{H}}$ are first projected via 
a 2-layer non-linear MLP, named projection head, %maps augmented node views 
to another latent space $\pmb{z}, \tilde{\pmb{z}}$ where the contrastive losses are calculated, as advocated by  \citep{chen2020simple,chen2021exploring, %and graphs 
zhu2020deep,zhu2021empirical}. 
% Finally, \alg\ learns high-quality node representations by contrasting the low-pass augmented views
% %filtered views 
% of every node and contrasting the high-pass augmented views of every node, generated by the \textit{same encoder}. 
%%%%%%%%%%%%%%%%%%%

Then, for each subgraph, we first consider every node $i$ in the first augmented subgraph view as the anchor, and contrast it with all the nodes in the second augmented subgraph view. This yields the following contrastive losses for the homophilic and heterophilic subgraphs, respectively:
% For every node \(i\), \hy{we first generate a pair of augmented node views. For the heterophilic subgraph, we consider the first augmented view, $z$, as the anchor to get $l(z_h^i, \tilde{z}_h^i)$, and then consider the second augmented view, $\tilde{z}$ as the anchor to get $l(\tilde{z}_h^i, z_h^i)$. In a similar manner, we get two terms for the homophilic subgraph. For each anchor node view, their corresponding augmented representation, \(\tilde{\pmb{z}}_h^i\), \(\tilde{\pmb{z}}_l^i\), as their positive samples. The rest of the projected representations from the other views are treated as the negative samples.}
% Formally, we have: 
\begin{align}
\hspace{-3mm}
l(\pmb{z}_l^i, \tilde{\pmb{z}}_l^i) &= \log \frac{e^{\text{sim} (\pmb{z}_l^i, \tilde{\pmb{z}}_l^i) / \tau}}{
e^{\text{sim} (\pmb{z}_l^i, \tilde{\pmb{z}}_l^i) / \tau}
+ \sum_{\substack{k\in[N],\\k \neq i}} e^{\text{sim} (\pmb{z}_l^i, \tilde{\pmb{z}}_l^k) / \tau}}
\label{eq:infoce_low}\\
% \end{equation}
% \begin{equation}
\hspace{-3mm}
l(\pmb{z}_h^i, \tilde{\pmb{z}}_h^i) &= \log \frac{e^{\text{sim} (\pmb{z}_h^i, \tilde{\pmb{z}}_h^i) / \tau}}{
e^{\text{sim} (\pmb{z}_h^i, \tilde{\pmb{z}}_h^i) / \tau}
+ \sum_{\substack{k\in[N],\\k \neq i}} e^{\text{sim} (\pmb{z}_h^i, \tilde{\pmb{z}}_h^k) / \tau}},
\label{eq:infoce_high}
\end{align}
% \begin{equation}
% l(\pmb{z}_l^i, \tilde{\pmb{z}}_l^i) = \log \frac{e^{\text{sim} (\pmb{z}_l^i, \tilde{\pmb{z}}_l^i) / \tau}}{
% \splitfrac{\textstyle
% e^{\text{sim} (\pmb{z}_l^i, \tilde{\pmb{z}}_l^i) / \tau} + \sum_{\substack{k\in[N],\\k \neq i}} e^{\text{sim} (\pmb{z}_l^i, \pmb{z}_l^k) / \tau}}
% {\textstyle
% + \sum_{\substack{k\in[N],\\k \neq i}} e^{\text{sim} (\pmb{z}_l^i, \tilde{\pmb{z}}_l^k) / \tau}}}
% \end{equation}
% \begin{equation}
% l(\pmb{z}_h^i, \tilde{\pmb{z}}_h^i) = \log \frac{e^{\text{sim} (\pmb{z}_h^i, \tilde{\pmb{z}}_h^i) / \tau}}{
% \splitfrac{\textstyle
% e^{\text{sim} (\pmb{z}_h^i, \tilde{\pmb{z}}_h^i) / \tau} + \sum_{\substack{k\in[N],\\k \neq i}} e^{\text{sim} (\pmb{z}_h^i, \pmb{z}_h^k) / \tau}}
% {\textstyle
% + \sum_{\substack{k\in[N],\\k \neq i}} e^{\text{sim} (\pmb{z}_h^i, \tilde{\pmb{z}}_h^k) / \tau}}}
% \end{equation}
%
 % \pmb{z}_h^i, \tilde{\pmb{z}}_h^i
% \log \frac{e^{\text{sim} (\pmb{z}_h^i, \tilde{\pmb{z}}_h^i) / \tau}}{e^{\text{sim} (\pmb{z}_h^i, \tilde{\pmb{z}}_h^i) / \tau} + \sum_{\substack{k\in[N],\\k \neq i}} e^{\text{sim} (\pmb{z}_h^i, \pmb{z}_h^k) / \tau} + \sum_{\substack{k\in[N],\\k \neq i}} e^{\text{sim} (\pmb{z}_h^i, \tilde{\pmb{z}}_h^k) / \tau}
% \noindent 
where $\text{sim}$ is the cosine similarity between the projected representations, and $\tau$ is a temperature parameter. The second term in the denominator represent the inter-view negative pairs, which are between the anchored view of node \( i \) and the views of all other nodes from the other view.
% Notably, the high-pass filter alleviates the need for using negative pairs in the contrastive loss, by automatically producing dissimilar representations for different nodes. Empirically, we also observed no improvement in the performance by incorporating the negative pairs, in presence of the high-pass filter. 

Similarly, for each subgraph we also consider the second augmented view of node $i$ as the anchor and contrast it with all the nodes in the first augmented subgraph view. 
Since two views are symmetric, the loss for using the other view as anchor is defined in a similar fashion. The overall objective to be minimized is then defined as the average over all the four contrastive losses. %positive pairs. 
Formally, we minimize: 
\vspace{-2mm}
\begin{equation}
% \begin{split}
% \vspace{-2mm}
\mathcal{L}_\text{\alg} \!\!=\! -\frac{1}{4N} \!\sum_{i = 1}^{N} [l(\pmb{z}_l^i, \tilde{\pmb{z}}_l^i) + l(\pmb{z}_h^i, \tilde{\pmb{z}}_h^i) +l(\tilde{\pmb{z}}_l^i, \pmb{z}_l^i) + l(\tilde{\pmb{z}}_h^i, \pmb{z}_h^i)].\label{eq:loss}
% \end{split}
\end{equation}
Effectively, by maximizing the agreement between the low-pass views and between the high-pass views, \alg pulls away the representation of nodes with different features from their neighborhood, and allows them to be distinguished from their neighbors.



\noindent\textbf{Final representations.}
After minimizing the contrastive loss in Eq. \eqref{eq:loss}, we use the low-pass filtered representations as the final output. 

The pseudocode is illustrated in Alg. \ref{alg:alg}.


\begin{algorithm}[t]
  \caption{High-pass and Low-pass Graph CL (\alg) }\label{alg:alg}
  \begin{algorithmic}[1]
  \State Infer subgraph $\mathcal{G}^{hom}$ by selecting the top $\lceil k_1 \times |\mathcal{N}_i| \rceil$ edges with highest cosine similarity for every node $i$.
  \State Infer subgraph $\mathcal{G}^{het}$ by selecting the top $\lceil k_2 \times |\mathcal{N}_i| \rceil$ edges with lowest cosine similarity for every node $i$.
  % and $\mathcal{G}^{het}$ based on cosine similarity between nodes in $\mathcal{G}$: select the top $\lceil k_1$ fraction of edges $\rceil$ with highest cosine similarity for $\mathcal{G}^{hom}$ and the top $\lceil k_2$ fraction of edges $\rceil$ with lowest cosine similarity for $\mathcal{G}^{het}$.
    \For{epoch $=1,2,3,\cdots$}
    \State Obtain augmented graph views $\mathcal{G}^{hom},\tilde{\mathcal{G}}^{hom},\mathcal{G}^{het},\tilde{\mathcal{G}}^{het}$ via random perturbations.
      % \State Input generated graphs into the graph Encoder $f(\cdot)$
      \State Generate high-pass node representations $\pmb{H}_H, \tilde{\pmb{H}}_H$ based on Eq. \eqref{eq:high}, \eqref{eq:high_aug}, using encoder weights $\pmb{W}$. %$\mathcal{G}^{het}, \tilde{\mathcal{G}}^{het}$, i.e., $\pmb{H}_{H}^l = \sigma(\pmb{F}_{HP} \pmb{H}^{l-1}_{H} \pmb{W}^{l-1}_{})$, where $\pmb{F}_{HP}=\pmb{\hat{L}}_{sym}^{het}$
       \State Generate low-pass node representations based on $\pmb{H}_L, \tilde{\pmb{H}}_L$ based on Eq. \eqref{eq:low}, \eqref{eq:low_aug}using encoder weights $\pmb{W}$.\looseness=-1
       %$\mathcal{G}^{hom}, \tilde{\mathcal{G}}^{hom}$, i.e., $\pmb{H}_{L}^l = \sigma(\pmb{F}_{LP} \pmb{H}^{l-1}_{H} \pmb{W}^{l-1}_{})$, where $\pmb{F}_{LP}=\pmb{\hat{A}}_{sym}^{hom}$
    %   \State \ba{projection head}
      \State Compute the contrastive objective $\mathcal{L}_{HLCL}$ in Eq. \eqref{eq:loss}. \looseness=-1
      \State Update the encoder weights $\pmb{W}$ by applying stochastic gradient ascent to minimize $\mathcal{L}_{HLCL}$.
      \If{epoch \% $T = 0$} 
    \State update $\mathcal{G}^{het}$, $\mathcal{G}^{hom}$, $\tilde{\mathcal{G}}^{het}$, $\tilde{\mathcal{G}}^{hom}$ based on current node representations.
        \EndIf
    \EndFor
  \end{algorithmic}
  % \vspace{-3mm}
\end{algorithm}



\noindent\textbf{Scalability to Large Graphs via Message Passing.} The high-pass and low-pass filtered representations can be obtained through message passing in an inductive manner, according to Eq. (\ref{eq:operations_A}), (\ref{eq:operations_L}), without the need to explicitly calculate the normalized Adjacency and Laplacian matrix. In particular, the high-pass filtered representations can be obtained by iteratively differentiating the %weight matrices used to compute
representations of a node and those of its neighbors, and the low-pass filtered representations can be obtained by aggregating the node's representation with those of its neighbors:
\begin{align}
    \pmb{h}_i^l&=\sigma(\pmb{W}^{l-1}\pmb{h}_i^{l-1}),\\
    (\pmb{h}_i^l)_L&=\Sigma_{j\in\{\mathcal{N}^{hom}_i\cup \{i\}\}} (\pmb{h}_i^l+\pmb{h}_j^l),\\
    (\pmb{h}_i^l)_H&=\Sigma_{j\in\{\mathcal{N}^{het}_i\cup \{i\}\}} (\pmb{h}_i^l-\pmb{h}_j^l).
\end{align}
The above update rules can be applied to both augmented subgraphs.
This is the same approach used to train GNNs on large graphs.
Hence, \alg will have the same complexity as conducting a normal GNN message passing with an additional message being passed to generate the high-pass filtered views. This makes \alg scalable to large graphs, as we will also confirm in our experiments.

In addition, we will empirically confirm in Appendix \ref{sec:compare} that directly contrasting the high-pass %filtered representations with the low-pass filtered 
and low-pass filtered
representations can produce comparable results to \alg, while speeding up the algorithm by 2x, as it requires minimizing only one pair of contrastive losses. 
% We compared the performance of \alg and its simplified version in Appendix \ref{sec:compare}.


\subsection{Theoretical Analysis}
Next, %from a spectral view, 
we theoretically prove that by separating the graph into homophilic and heterophilic subgraphs and applying low-pass and high-pass filters on them respectively, \alg can %learn superior representations. 
encode both low-frequency and high-frequency information in the learned representations.

Following \citep{liu2022revisiting}, we simplify the contrastive losses \eqref{eq:infoce_low}, \eqref{eq:infoce_high} by assuming $\tau = 1$ and using inner product for $sim$. Additionally, we assume %the \alg filtering process involves a single message passing
a one-layer linear encoder.
% without non-linear activation.}
\newtheorem{theorem}{Theorem}
\begin{theorem}[\alg: Spectral Invariance] \label{the:major}
Under the above assumptions and given ideal subgraphs \( G_{\text{hom}} \) and \( G_{\text{het}} \),
% we denote the adjacency matrix of \( G_{\text{hom}} \) as \( A \) and its generated augmentation's adjacency matrix as \( V_A \); we denote the Laplacian matrix of \( G_{\text{hetero}} \) as \( L \), and its augmented graph's Laplacian matrix as \( V_L \). The amplitude of the \( i \)-th frequency of \( A \) and \( V_A \) are denoted as \( \lambda_{A_i} \) and \( \lambda_{V_{A_i}} \), respectively, and the amplitude of the \( i \)-th frequency of \( L \) and \( V_L \) are denoted as \( \lambda_{L_i} \) and \( \lambda_{V_{L_i}} \), respectively. 
% We establish 
the \alg loss can be lower-bounded as follows: %following lower bound:
\begin{align*}
\mathcal{L}_\text{\alg} &\geq \frac{-1 - N}{2} \sum_i \Bigl( \alpha_{A_i} \bigl(2 - (\lambda_{\pmb{A}_{i}^{hom}} - \lambda_{\tilde{\pmb{A}}_{i}^{hom}})^2\bigr) \\
&\qquad + \alpha_{L_i} \bigl(4 - (\lambda_{\pmb{L}_{i}^{het}} - \lambda_{\tilde{\pmb{L}}_{i}^{het}})^2\bigr)\bigr),
\end{align*}
where $\lambda_{\pmb{A}^{hom}}, \lambda_{\tilde{\pmb{A}}_{i}^{hom}}$  denote the eigenvalues of the %low-pass  homophilic subgraph filter 
low-pass filters corresponding to augmented homophilic subgraph,
$\lambda_{\pmb{L}^{het}}, \lambda_{\tilde{\pmb{L}}_{i}^{het}}$
denote the eigenvalues of the %low-pass  homophilic subgraph filter 
high-pass filters corresponding to augmented heterophilic subgraph, and $\alpha_{\pmb{A}^{hom}}$, $\alpha_{\pmb{L}^{het}}$ are adaptive weights that change during the training as the parameters of the encoder changes. 
\end{theorem}

% The above theorem shows that by minimizing the \alg loss during the training, the parameters of the encoder change such that the lower bound becomes small. 
% In doing so, the encoder changes such that it assigns a larger weight ($\alpha_{A_i}$ and $\alpha_{L_i}$) to invariant frequencies $i$, for which $\lambda_{A_i}^{hom}$ $\sim$ $\hat{\lambda}_{A_i}^{hom}$ and $\lambda_{L_i}^{het}$ $\sim$ $\hat{\lambda}_{L_i}^{het}$, i.e., the two contrasted augmentations are invariant at frequency $i$.

Theorem \ref{the:major} provides a lower-bound for the \alg loss.
%%%
%%%
The lower-bound is in the form of a summation of two terms: the first term is the sum of the difference between the low-frequency components of the two low-pass filtered augmented views of the homophilic subgraph, %i.e. $\frac{-1-N}{2}\sum_i \alpha_{A_i} (2- (\lambda_{A_i}^{\text{hom}}-\lambda_{\tilde{A}_i}^{\text{hom}})^2$), 
and the second term is the sum of the difference between the two high-pass filtered augmented views of the heterophilic subgraph. %, i.e. $\frac{-1-N}{2}\sum_i \alpha_{L_i}(4-(\lambda_{L_i}^{\text{het}}-\lambda_{\tilde{L}_i}^{\text{het}})^2)$. 
Minimizing the \alg loss ensures a small value for the lower bound. 
In doing so, the encoder changes such that it assigns a larger weight ($\alpha_{A_i}$ and $\alpha_{L_i}$) to invariant frequencies $i$, for which $\lambda_{A_i}^{hom}$ $\sim$ $\hat{\lambda}_{A_i}^{hom}$ and $\lambda_{L_i}^{het}$ $\sim$ $\hat{\lambda}_{L_i}^{het}$. %, i.e., the two contrasted augmentations are invariant at frequency $i$.
% In doing so, %the entire expression inside the sum (without the negative sign before the sum) is maximized, hence 
% larger $\alpha{A_i}$ will be assigned to the smaller $(\lambda_{A_i}^{hom}-\lambda_{\tilde{A}_i}^{hom})^2$, hence, $(\lambda_{A_i}^{hom} \sim \lambda_{\tilde{A}_i}^{hom})$. 
Notably, $(\lambda_{A_i}^{hom} \sim \lambda_{\tilde{A}_i}^{hom})$ implies that the two contrasted augmentations are invariant at $i^{th}$ frequency. Same reasoning holds for the second term. Therefore, during training with \alg, the encoder will emphasize the invariance between two contrasted augmentations from the spectrum domain, for both the homophilic and heterophilic subgraphs.


The proof is given in the Appendix. \ref{sec:proof}