\section{Appendix}
\subsection{Hyperparameters Details}
\label{sec:hype}
We list the details of our model hyperparameters for each datasets in Table. \ref{tab:hyperparameter}.
\begin{table}[h]
\centering
\setlength{\tabcolsep}{3pt}
\small % Reducing the font size
\caption{\alg hyperparameters for each dataset}
\label{tab:hyperparameter}
\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|}
\hline
            & Cora & CiteS & Pubmed & Actor & Cham & Squir & Penn & TwitchG & Genius \\ \hline
$k_2$       & 0.9  & 0.9   & 0.8    & 0.1   & 0.2  & 0.5   & 1.0  & 1.0     & 1.0    \\ \hline
$k_1$       & 0.1  & 0.1   & 0.2    & 0.9   & 0.8  & 0.5   & 1.0  & 1.0     & 1.0    \\ \hline
lr          & 1e-3 & 1e-3  & 1e-3   & 1e-3  & 1e-3 & 1e-3  & 1e-3 & 1e-3    & 1e-3   \\ \hline
$T$         & 50   & 50    & 50     & 50    & 50   & 50    & 50   & 50      & 50     \\ \hline
\end{tabular}
\end{table}

\subsection{Dataset Details}
\label{sec:dataset}
For graphs with homophily, we use the citation networks including Cora, Citeseer, and Pubmed ~\citep{yang2016revisiting}. For graphs with heterophily, we use the Wikipedia network and the web page networks including Chameleon, Squirrel, and Actor ~\citep{rozemberczki2021multi,pei2020geom}. Note that, for fair comparison, we adopt the Chameleon and Squirrel from ~\citep{platonov2023critical} with duplicated nodes removed.To illustrate the scalability of \alg, we also include three large-scale real-world datasets, Penn94, Genius, and Twitch-gamers provided by ~\citep{lim2021large}.
% \begin{table*}[t]
%   \centering
% {%\footnotesize 
% \small
% % \vspace{-7mm}
% \caption{Statistics of datasets used in experiments}\label{tab:dataset}
% \vspace{-1mm}
% {\small
%   \begin{tabular} {c c c| c c c c c}
%     \toprule
% &\multicolumn{2}{c}{Homophily}&\multicolumn{5}{|c}{Heterophily}\\\midrule
% &Cora & CiteSeer  & Actor & Chameleon & Squirrel  & Penn94 & Twitch-gamers \\
% \toprule
% \multicolumn{1}{c|}{Hom.($\beta$)} & $.83$ & $.71$ & $.09$ & $.23$ & $.19$ & $.48$ & $.56$\\
% \multicolumn{1}{c|}{Nodes} & 2,708 & 3,327 & 5,201 & 2,277 & 5,201 & 41,554 & 168,114\\
% \multicolumn{1}{c|}{Edges} & 5,278 & 4,676 & 198,493 & 8,854 & 46,998 & 1,362,229 & 6,797,557\\
% \multicolumn{1}{c|}{Classes} & 6 & 7 & 5 & 5 & 5 & 2 & 2\\
% \bottomrule
% \end{tabular}
% }
% }%\vspace{-5mm}
% \end{table*}

\subsection{Existing GCL methods with HP filters}
\label{sec:gcl_hp}
\vspace{-1mm}
In this section, we illustrate the importance of our contrastive structure in achieving performance gains on heterophily datasets. We show that contrasting both the low-pass filtered graph views and high-pass filtered graph views is crucial to obtain high-quality representation under heterophily, as opposed to applying high-pass filter. To do so, we replace the LP filter with HP filter in other popular graph CL methods. The results are shown in Table \ref{tab:swap}. As demonstrated, while there are performance gains on some heterophily datasets, accuracy significantly deteriorates in homophily settings.\looseness=-1
For larger values of $\beta$, it is more likely that nodes with the same labels are connected together. 
In graphs with a large homophily ratio, most of the neighborhoods have homogeneous labels. On the other hand, graphs with a small homophily ratio contain neighborhoods with homogeneous and heterogeneous labels, as illustrated in Fig. \ref{fig:mix_graph}.
%%%
Existing graph CL methods have a very poor performance under heterophily, or low homophily ratio, and cannot learn high-quality representations.\looseness=-1
\begin{table*}[ht!]
  \centering %\footnotesize 
\caption{%Graph CL under linear probe.
% Swapping low-pass encoders in GCL methods to high-pass encoders.
Using high-pass filter in existing Graph CL methods. \alg denotes our method}\label{tab:swap}
\vspace{-1mm}
  \begin{tabular} {c c c| c c c}
    \toprule
%\multirow{2}{4em}{Method}
&\multicolumn{2}{c}{Homophily}&\multicolumn{3}{|c}{Heterophily}\\\cmidrule{2-6}
&Cora & CiteSeer & Chameleon & Squirrel & Actor\\\midrule
\multicolumn{1}{c|}{\textbf{\alg}} & 84.1 $\pm$ 1.0 & 70.1 $\pm$ 0.8 & 50.9 $\pm$ 1.0 & 42.9 $\pm$ 2.6 & 34.0 $\pm$ 0.2\\\midrule
% \multicolumn{1}{c|}{${\alg}_{full}$} & 82.34 $\pm$ 2.7 & 71.35 $\pm$ 1.4 & 35.28 $\pm$ 2.7 & 38.02 $\pm$ 2.6 & 30.74 $\pm$ 1.7\\\midrule
\multicolumn{1}{c|}{DGI:low}  & 84.53 $\pm$ 1.1 & 71.88  $\pm$ 0.7 & 32.58 $\pm$ 2.9 & 38.83 $\pm$ 2.3 & 28.00 $\pm$ 1.4 \\
\multicolumn{1}{c|}{DGI:high} & 31.95 $\pm$ 2.8 &  30.54 $\pm$ 1.7 & 29.89 $\pm$ 3.0 & 36.86 $\pm$ 3.0 & 32.03 $\pm$ 0.9\\\midrule
\multicolumn{1}{c|}{BGRL:low} & 83.01 $\pm$ 0.7 & 69.81 $\pm$ 0.6 & 32.58 $\pm$ 4.7 & 35.70 $\pm$ 1.4 &  28.32 $\pm$ 0.9\\
\multicolumn{1}{c|}{BGRL:high}  & 29.63 $\pm$ 2.8 & 24.99 $\pm$ 3.1 & 37.30 $\pm$ 6.0 & 38.03 $\pm$ 1.1 & 33.87 $\pm$ 1.9 \\\midrule
\multicolumn{1}{c|}{GRACE:low} & 83.69 $\pm$ 0.7 & 71.37 $\pm$ 1.0 & 35.39 $\pm$ 3.6 & 36.18 $\pm$ 2.8 & 34.5 $\pm${1.1}\\
\multicolumn{1}{c|}{GRACE:high}  & 32.46 $\pm$ 2.0 & 26.55 $\pm$ 3.1 & 33.03 $\pm$ 3.9 & 33.05 $\pm$ 2.1 & 32.00 $\pm$ 1.3 \\
\bottomrule
\end{tabular}
\end{table*}


\subsection{Simplified \alg}
\label{sec:compare}
Empirically, we observed that directly contrasting the high-pass filtered representations with the low-pass filtered representations can produce comparable results to \alg, as shown in Table \ref{tab:compare}. This simplified version can speed up the algorithm by $2\times $, as it requires only one contrasting learning process.

\begin{table*}[ht!]
  \centering %\footnotesize 
\caption{Comprasion between \alg and Simplified \alg}\label{tab:compare}
\vspace{-1mm}
  \begin{tabular} {c c c| c c c}
    \toprule
%\multirow{2}{4em}{Method}
&\multicolumn{2}{c}{Homophily}&\multicolumn{3}{|c}{Heterophily}\\\cmidrule{2-6}
&Cora & CiteSeer & Chameleon & Squirrel & Actor\\\midrule
\multicolumn{1}{c|}{\textbf{\alg}} & 84.1 $\pm$ 1.0 & 70.1 $\pm$ 0.8 & 50.9 $\pm$ 1.0 & 42.9 $\pm$ 2.6 & 34.0 $\pm$ 0.2\\\midrule
% \multicolumn{1}{c|}{${\alg}_{full}$} & 82.34 $\pm$ 2.7 & 71.35 $\pm$ 1.4 & 35.28 $\pm$ 2.7 & 38.02 $\pm$ 2.6 & 30.74 $\pm$ 1.7\\\midrule
\multicolumn{1}{c|}{$\alg_{Simplified}$}  & 83.5 $\pm$ 2.7 & 71.8  $\pm$ 1.4 & 48.3 $\pm$ 6.8 & 39.5 $\pm$ 5.3 & 35.5 $\pm$ 1.9 \\
\bottomrule
\end{tabular}
\end{table*}
\subsection{Using features to infer label information}
\label{sec:feature_label}
{\alg uses feature information to approximately estimate the label information. Here, we justify this choice empirically and demonstrate that while feature information can help in inferring subgraphs approximately, it cannot be used for accurte node classification.
First, we show that the node features are sufficient to give approximate neighborhood information, which is helpful in splitting the subgraph. We provide the homophily ratios of the original graph, the homophilic subgraph, and the heterophilic subgraph selected based on feature similarity across different datasets. As shown in Table \ref{tab:feature_sample}, using feature cosine similarity, \alg can approximately create homophilic and heterophilic subgraphs from the original graph.
However, while features can approximately indicate if neighboring nodes are of the same class, they are insufficient for accurate (multi-class) node classification, and the graph structure is crucial to take into account. Otherwise, one could simply use an MLP classifier on node features. It is important to note that approximately identifying a homophilic and a heterophilic subgraph is a binary classification task, which is significantly easier than multi-class node classification. We show the insufficiency of node features for accurate classification without graph structure in Table \ref{tab:feature_performance}. We conducted additional experiments with an MLP classifier on various homophily and heterophily datasets, which showed that MLP yields very poor performances, particularly under heterophily.}

\begin{table}[h]
\centering
\caption{Homophily ratios of the subgraphs sampled via node features. After sampling, homophilic subgraph has a higher homophily ratio, while the heterophilic subgraph has a lower homophily ratio compared to the original graph.}
\label{tab:feature_sample}
\begin{tabular}{lcccc}
\hline
            & \textbf{Cora(hom)} & \textbf{Citeseer(hom)} & \textbf{Chameleon(het)} & \textbf{Squirrel(het)} \\ \hline
\textbf{orig graph hom\%} & 0.83  & 0.71  & 0.23  & 0.19  \\
\textbf{hom subgraph hom\%} & 0.87  & 0.82  & 0.74  & 0.42  \\
\textbf{het subgraph hom\%} & 0.08  & 0.05  & 0.24  & 0.19  \\ \hline
\end{tabular}
\end{table}

\begin{table}[h]
\centering
\caption{Using node feature only (MLP) to classify the nodes. As shown, without graph structures, the model can only achieve sub-optimal performances.}
\label{tab:feature_performance}
\begin{tabular}{lcccc}
\hline
            & \textbf{Cora (6 classes)} & \textbf{Citeseer (7 classes)} & \textbf{Chameleon (5 classes)} & \textbf{Squirrel (5 classes)} \\ \hline
\textbf{MLP}  & $64.8 \pm 1.2$           & $66.5 \pm 1.0$                & $37.4 \pm 2.1$                 & $25.5 \pm 0.9$                 \\
\textbf{HLCL} & $84.1 \pm 1.0$           & $70.1 \pm 0.8$                & $50.9 \pm 1.0$                 & $42.9 \pm 2.6$                 \\ \hline
\end{tabular}
\end{table}

\subsection{Extended Related Work}
\label{sec:related}
\textbf{(Semi-)supervised learning on graphs.} In recent years, GNNs have become one of the most prominent tools for processing graph-structured data.  
In general, GNNs utilize the adjacency matrix to learn the node representations, by aggregating information within every node's neighborhood \citep{defferrard2016convolutional,kipf2016semi}. Existing variants, including GraphSAGE \citep{hamilton2017inductive}, Graph Attention (GAT) \citep{velivckovic2017graph}, MixHop \citep{abu2019mixhop}, SGC \citep{nt2019revisiting}, GAT \citep{velickovic2019deep}, and GIN \citep{xu2018powerful},
learn a more general class of neighborhood mixing relationships, by aggregating weighted information within a multi-hop neighborhood of every node. GNNs can be generally seen as applying a fix, or a parametric and learnable (e.g. GAT) low-pass graph filter to graph signals. Those with trainable parameters can adapt to a wider range of frequency levels on different graphs. However, they still have a higher emphasis on lower-frequency signals and discard the high-frequency signals in a graph. 
While the aggregation operation %aggregating the neighborhood information 
makes GNNs powerful tools for semi-supervised learning, it %often leads to over-smoothing issue, i.e., 
can make the learned node representations indistinguishable in a neighborhood \citep{nt2019revisiting}.
As a result, typical GNNs and their variants have been long criticized %on their inability to capture graph heterophily %and
for their poor generalization performance %on heterogeneous dataset
under heterophily
\citep{balcilar2020analyzing}. %, 

\noindent\textbf{Graph self-supervised learning.}
Graph self-supervised learning methods have become a powerful tool for learning representations without any labels, and graph contrastive learning is the most successful and popular model structure. Numerous methods have been proposed in the field: \citep{velickovic2019deep,peng2020graph,hassani2020contrastive,zhu2021graph} focus on contrasting the global augmented representation with the local augmented representation, while \citep{zhu2020deep,you2020graph,qiu2020gcc,liu2022revisiting} contrast same-scale representation, global or local, in two augmented views. Due to the complexity of collecting negative samples in graph data, negative-sample free contrastive objectives have also been studied \citep{thakoor2021large,bielak2021graph}. 
However, works mentioned above focus on encoding the homophily graphs and perform poorly on graphs with heterophily. Recently, a stream of self-supervised learning methods have been proposed to learn effectively the node representations of the heterophily graphs without any labels. HGRL \citep{chen2022towards} improves the node representations on heterophilic graphs by preserving the node original features and rewiring informative nodes that are not in the local neighborhood. SP-GCL \citep{wang2022can} proposed using nodes from the T-hop neighborhood of a node with high feature similarities as positive pairs, without using any explicit augmentations. DSSL \citep{xiao2022decoupled} separates the heterogeneous patterns in local neighborhood distributions to capture both homophilic and heterophilic information globally. GREET \citep{liu2023beyond} discriminates homophilic edges from heterophilic edges using random walk based graph diffusion and contrasts the projected representations of the two graph views directly via a dual-channel contrastive loss. MUSE \citep{yuan2023muse} utilize semantic view contrast based on ego node feature perturbations and contextual view contrast based on topology perturbations. Then, it integrates the representations learned from both contrasting views to construct a fusion contrast that combines both structural and semantic information. NeCo \citep{he2023contrastive} proposes a new pretext task, group discrimination, which divides the nodes into k groups and keeps the consistent representation of nodes within a group.

\noindent\textbf{Graph (semi-)supervised learning under heterophily.}
To address over-smoothing issue of GNNs, recent methods propose to use other types of aggregation that better fit graphs with heterophily. Geom-GCN uses geometric aggregation in place of the typical aggregation \citep{pei2020geom}, H$_2$GCN uses several special model designs including separate aggregation and higher-neighborhood aggregation to train the model for handling graphs with heterophily, and CPGNN trains a compatibility matrix to model the heterophily level
\citep{zhu2020graph}. 
More recently, \cite{wang2019demystifying} proposed to learn an aggregation filter for every graph from a set of based filters designed based on different ways of normalizing the adjacency matrix.
% designed based on three different normalization strategies of the adjacency matrix. 
% However, AFGNN does not consider adaptability in the frequency domain.
\cl{GGCN introduced degree corrections and signed message passing on GCN to address both oversmoothing problems and the model's poor performances on heterophily graphs \citep{yan2021two}. \cite{zhu2021interpreting} analyzed and designed a uniform framework for GNNs propagations and proposed GNN-LF and GNN-HF that preserve information of different frequency separately by using different filtering kernels with learnable weights.}
FAGCN  \citep{bo2021beyond} and FBGNN \citep{luan2020complete} %also leveraged the Laplacian matrix to capture 
% create a high-pass filter in addition to the existing low-pass one to capture 
% the non-smooth signals discarded in typical GNNs. 
train
two \textit{separate} encoders %are trained 
to capture the high-pass and low-pass graph signals separately. Then they rely on labels to learn relatively complex  mechanisms to combine the outputs of the encoders. %the two outputs are combined with learnable parameters to balance their importance.
% a combination of the outputs to obtain
% Such methods achieve a superior performance under heterophily. However, %FAGCN relies on learning a complex  mechanism to combine the outputs of the encoders, and FBGCN
However, learning how to combine the encoder outputs is highly sensitive to having high-quality labels. This makes such methods highly impractical for unsupervised contrastive learning, where the label information is not available.
%%%%%%%%%%%%%%%%%%%%%
% Specifically, since under heterophily nodes in the same neighborhood have different labels, having access to labels is extremely beneficial for node classification. %under heterophily. 
% Importantly, existing methods based on graph filters leverage the labels to find the best combinations of the low- and high-pass filters that yields the best performance on a particular graph. In contrast in the unsupervised setting, 
% the label information is not available and it is not clear how the low-pass and high-pass filters should be combined to achieve a good performance. 

Unlike the above supervised methods, we apply the high-pass and low-pass filters to different subgraphs, contrasting the resulting high-pass filtered node views and low-pass filtered node views in a self-supervised manner, without any label. This is in contrast to learning the best combination of filtered signals of different encoders based on labels.
% in the \textit{same encoder} to generate augmented graph views that are \textit{contrasted} with their low-pass counterparts to learn unsupervised representations. 

% parameter tuning on different graphs. 
% This is very different from combining low-pass and high-pass filters to get the best classification performance for every individual graph, as is done in existing supervised methods. 
% In the output layer, the representations learned by the low-pass and high-pass channels are added with different weights to balance their relative importance. FBGNN and FAGCN achieve state-of-the-art on graphs with either homophily or heterophily. 
% \hy{However, the two-pass channel has only been applied in an end-to-end manner in supervised learning setting where high-quality labels are needed. In our study, we explore the potential of the high-pass and low-pass filter in a self-supervised learning setting, exploiting its structure with contrastive learning to learn meaningful representations on only only graphs with homophily but also with heterophily.}

% But, the balancing parameters only converge stably when there are sufficient labels.
% This prevents such methods to be applicable in scenarios where the labels are scarse, such as self- and semi-supervised learning.
% where most connected nodes are from different classes. 
% representation \citep{nt2019revisiting}. 

% \hy{comment}


% \subsection{\alg learns better representation}
% \label{sec:spectrum_study}

% In this section, we demonstrate the superiority of \alg in generating meaningful representations for heterophily graphs by contrasting both high-pass and low-pass filtered graph views. This approach differs from traditional Graph Contrastive Learning (GCL) methods like GRACE \cite{zhu2020deep}, which utilize only low-pass filtered views. We apply our methods to 'Chameleon', a popular heterophily dataset\cite{platonov2023critical}, and 'Citeseer', a well-known homophily dataset ~\cite{yang2016revisiting}. Following a similar approach to ~\cite{xue2022investigating}, we first conduct a spectral analysis on the output representation of \alg, which incorporates both high-pass and low-pass filtered representations. We then compare this with GRACE ~\cite{zhu2020deep}, which relies solely on low-pass filtered representation. To align the spectra more effectively, we normalize the eigenvalues into the range (0,1). Fig. \ref{fig:sp_alg_combined} illustrates this comparison. As shown in Fig. \ref{fig:chameleon_spectrum}, \alg's representation diminishes the smaller eigenvalues of the Jacobian, leading to a lower-rank structure. This reduction in noise enhances the output representation. Fig. \ref{fig:chameleon_alignment} further confirms the superiority of \alg's representation on heterophily graphs by demonstrating a stronger alignment between the eigenvectors of the Jacobian and the clean label vectors, contributing to higher classification accuracy. On the other hand, on homophily graphs, \alg exhibits a similar Jacobian low-rank structure and alignment between the eigenvectors and clean label vectors as GRACE. Thus, by contrasting low-pass with high-pass filtered graph view, \alg does not harm the performances on homophily graphs. 



% In this section, we demonstrate the superiority of \alg in generating meaningful representations for heterophily graphs by contrasting both the high-pass and low-pass filtered graph views, as opposed to traditional GCL methods like GRACE ~\cite{zhu2020deep}, which only utilize low-pass filtered views. We apply our methods on Chameleon , a popular heterophily dataset~\cite{platonov2023critical} and Citeseer, a popular homophily dataset ~\cite{yang2016revisiting}. We follow a similar setting to ~\cite{xue2022investigating}: First, we conduct a spectral analysis on the output representation of \alg, which used both high-pass and low-pass filtered representations, and compared that with GRACE ~\cite{zhu2020deep}, which used only low-pass filtered representation as the final representation. To better align the spectrum, we first normalized the eigenvalues into the range (0,1). Fig. \ref{fig:sp_alg_combined} shows the comparison. As illustrated in Fig. \ref{fig:chameleon_spectrum}, \alg's representation shrinks the smaller eigenvalues of the Jacobian and has a lower-rank structure. This signify the reducement of noises in the output representation and help better classification. Fig. \ref{fig:chameleon_alignment} further confirm the superiority of the \alg representation on heterophily graph by showing a stronger alignment between the eigenvectors of the Jacobian and the clean label vectors, which contributes to its higher classification accuracy. On the other hand, on homophily graph, \alg shows a similar Jacobian low-rank structure as well as a similar alignment between the eigenvectors and the clean label vectors with GRACE. 




% First, we conduct a spectral analysis on the output representation of \alg and GRACE. As shown in Figure \ref{fig:spectrum}, \alg produces a representation matrix with a lower rank structure and larger largest-eigenvalue components. These characteristics contribute to \alg's superior performance on heterophily graphs, which correspond to the conclusions on the benefits of the low-rank representations drawn from ~\cite{xue2022investigating}. Then, we compare the alignment of their representation spectrums to the class label vectors, as shown in Figure \ref{fig:alignment}. Again, \alg's representation shows a better alignment to the class labels, which contributes to its higher classification accuracy.

% \begin{figure}[htbp]
%   \centering
%   \begin{subfigure}[b]{0.48\columnwidth}
%     \includegraphics[width=\linewidth]{IJCAI/Fig/chameleon_spectrum_updated.png}
%     \caption{}
%     \label{fig:chameleon_spectrum}
%   \end{subfigure}
%   \hfill
%   \begin{subfigure}[b]{0.48\columnwidth}
%     \includegraphics[width=\linewidth]{IJCAI/Fig/chameleon_alignment_updated.png}
%     \caption{}
%     \label{fig:chameleon_alignment}
%   \end{subfigure}
%   \begin{subfigure}[b]{0.48\columnwidth}
%     \includegraphics[width=\linewidth]{IJCAI/Fig/citeseer_spectrum_updated.png}
%     \caption{}
%     \label{fig:citeseer_spectrum}
%   \end{subfigure}
%   \hfill
%   \begin{subfigure}[b]{0.48\columnwidth}
%     \includegraphics[width=\linewidth]{IJCAI/Fig/citeseer_alignment_updated.png}
%     \caption{}
%     \label{fig:citeseer_alignment}
%   \end{subfigure}
  
%   % Shared caption and label for the merged figure
%   \caption{(a) and (c) display the distribution of eigenvalues in the Jacobian matrices of the Chameleon and Citeseer datasets, respectively. (b) and (d) illustrate the alignment of the clean labels with the eigenvectors for Chameleon and Citeseer. In both (a) and (c), the representations exhibit eigenvalues at 1, resulting in an overlap.}
%   \label{fig:sp_alg_combined}.
% \end{figure}




% \hy{comment}


% \begin{table*}[t]
%   \centering
% {\small %\footnotesize 
% \caption{Performance of using high-pass (HP) and low-pass (LP) filter when we have the perfect homophilic subgraph and heterophilic subgraph. }\label{tab:subgraph_ideal}
% \vspace{-1mm}
%   \begin{tabular} {c c | c c}
%     \toprule
% %\multirow{2}{4em}{Method}
% &\multicolumn{1}{c}{Homophily}&\multicolumn{2}{|c}{Heterophily}\\\cmidrule{1-4}
%  &Cora  & Chameleon & Squirrel \\\midrule
% \multicolumn{1}{c|}{\textbf{\alg}}  & 82.50 & 48.31 & 39.46 \\\midrule
% \multicolumn{1}{c|}{\alg with ideal $\mathcal{G}^{hom}, \mathcal{G}^{het}$}  & 89.71 & 61.58 & 47.41  \\
% \multicolumn{1}{c|}{LP on ideal $\mathcal{G}^{hom}$}  & 87.13 & 53.70 & 44.91  \\
% \multicolumn{1}{c|}{HP on ideal $\mathcal{G}^{het}$}  & 63.60 & 58.90  & 39.90  \\
% \bottomrule
% \end{tabular}
% }
% \end{table*}

% \begin{figure*}
%     \centering
%     \begin{subfigure}{0.25\textwidth}
%         \centering
%         \includegraphics[width=\textwidth]{Fig/chameleon_OMG.pdf}
%         \caption{Homophily neighborhood }
%     \end{subfigure}\hspace{1cm}
%     \begin{subfigure}{0.25\textwidth}
%         \centering
%         \includegraphics[width=\textwidth]{Fig/chameleon_OTG.pdf}
%         \caption{Heterophily neighborhood }
%     \end{subfigure}
%     \vspace{-2mm}
%     \caption{ (Fig 6) Chameleon ($\beta=0.23$). Heterophily graphs contain neighborhoods with homogeneous and heterogeneous labels. 
%     % Thus, replacing a low-pass filter with a high-pass filter will not achieve higher performance under heterophily. 
%     }
%     \label{fig:mix_graph}
% \end{figure*}

% \subsection{Related Works}
% \label{sec:related}
% % \vspace{-1mm}
% % \section{Related Work} 
% % \subsection{Graph Neural Networks and Graph Filters}
% % \vspace{-2mm}
% \textbf{(Semi-)supervised learning on graphs.} In recent years, GNNs have become one of the most prominent tools for processing graph-structured data.  
% In general, GNNs utilize the adjacency matrix to learn the node representations, by aggregating information within every node's neighborhood ~\cite{defferrard2016convolutional,kipf2016semi}. Existing variants, including Graph Attention (GAT) ~\cite{velivckovic2017graph}, MixHop ~\cite{abu2019mixhop}, SGC ~\cite{nt2019revisiting}, GAT ~\cite{velickovic2019deep}, and GIN ~\cite{xu2018powerful},
% learn a more general class of neighborhood mixing relationships, by aggregating weighted information within a multi-hop neighborhood of every node. GNNs can be generally seen as applying a fix, or a parametric and learnable (e.g. GAT) low-pass graph filter to graph signals. Those with trainable parameters can adapt to a wider range of frequency levels on different graphs. However, they still have a higher emphasis on lower-frequency signals and discard the high-frequency signals in a graph. 
% While the aggregation operation %aggregating the neighborhood information 
% makes GNNs powerful tools for semi-supervised learning, it %often leads to over-smoothing issue, i.e., 
% can make the learned node representations indistinguishable in a neighborhood ~\cite{nt2019revisiting}.
% As a result, typical GNNs and their variants have been long criticized %on their inability to capture graph heterophily %and
% for their poor generalization performance %on heterogeneous dataset
% under heterophily
% ~\cite{balcilar2020analyzing}. %,  
% % where most connected nodes are from different classes. 
% % representation ~\cite{nt2019revisiting}. 


% \vspace{2mm}
% \noindent\textbf{(Semi-)supervised learning under heterophily.}
% To address over-smoothing issue of GNNs, recent methods propose to use other types of aggregation that better fit graphs with heterophily. Geom-GCN uses geometric aggregation in place of the typical aggregation ~\cite{pei2020geom}, H$_2$GCN uses several special model designs including separate aggregation and higher-neighborhood aggregation to train the model for handling graphs with heterophily, and CPGNN trains a compatibility matrix to model the heterophily level
% ~\cite{zhu2020graph}. 
% More recently, ~\cite{wang2019demystifying} proposed to learn an aggregation filter for every graph from a set of based filters designed based on different ways of normalizing the adjacency matrix.
% % designed based on three different normalization strategies of the adjacency matrix. 
% % However, AFGNN does not consider adaptability in the frequency domain.
% Most recently, 
% \cl{GGCN introduced degree corrections and signed message passing on GCN to address both oversmoothing problems and the model's poor performances on heterophily graphs ~\cite{yan2021two}. \cite{zhu2021interpreting} analyzed and designed a uniform framework for GNNs propagations and proposed GNN-LF and GNN-HF that preserve information of different frequency separately by using different filtering kernels with learnable weights.}
% FAGCN  ~\cite{bo2021beyond} and FBGNN ~\cite{luan2020complete}
% train two \textit{separate} encoders to capture the high-pass and low-pass graph signals separately. Then they rely on labels to learn relatively complex  mechanisms to combine the outputs of the encoders.
% However, learning how to combine the encoder outputs is highly sensitive to having high-quality labels. This makes such methods highly impractical for unsupervised contrastive learning, where the label information is not available. 

% \noindent Unlike the above supervised methods, we apply the high-pass and low-pass filters to different subgraphs and contrast the resulting high-pass and low-pass filtered node views in a self-supervised manner, without any label.
% This is in contrast to learning the best combination of filtered signals of different encoders based on labels.

% \vspace{2mm}
% \noindent\textbf{Contrastive learning on graphs.}
% For contrastive learning on graphs, global graph-level and local node-level data are augmented and contrasted in different ways. DGI ~\cite{velickovic2019deep} and GMI ~\cite{peng2020graph} contrast graph and node representations within one augmented view of the original graph. More recent methods contrast global and local representations in two augmented views. \textsc{GraphCL} generates graph augmentations by subgraph sampling, node dropping, and edge perturbation and contrasts the augmented graph representations. GCC samples and contrasts subgraphs of the original graph ~\cite{qiu2020gcc}. MVGRL leverages node diffusion to augment the graph and contrasts the node representations ~\cite{hassani2020contrastive}. 
% Contrasting the local node representations has been shown to achieve state-of-the-art. \textsc{GRACE} contrasts the node representations in two graph views augmented with feature masking and edge removal ~\cite{zhu2020deep}.
% GCA extends this by dropping the less important edges and features, based on node centrality and feature importance metrics
% ~\cite{zhu2021graph}. A thorough empirical study on the combinatorial effect of different augmentations has been conducted by \cite{zhu2021empirical}. 
% Due to the complexity of collecting negative samples in graph data, negative-samples-free 
% contrastive objectives have been also studied. Among existing methods, BGRL that uses the Bootstrapping Latent loss ~\cite{thakoor2021large}, and GBT uses Barlow Twins loss ~\cite{bielak2021graph} are the most successful.
% Existing graph CL methods explicitly augment the input graph and contrast the augmented graph representations obtained with {low-pass} GNN-based encoders. In doing so, they only capture the similarity of nodes in a neighborhood. Hence, they perform poorly on graphs with heterophily.To address this, recently, HGRL \cite{chen2022towards} proposed rewiring the entire graph first to drop edges connecting nodes in different classes and add edges connecting nodes in the same class.
% Notably, instead of using a GNN encoder, HGRL leverages an MLP to avoid low-pass aggregation on edges connecting different classes. It also learns different weights on edges in the multi-hop neighborhood to capture more information in the graph.
% SP-GCL \cite{wang2022can} proposed using more positive pairs from the T-hop neighborhood of a node, without using any explicit augmentations.
% \noindent In contrast, we leverage low-pass and high-pass graph filters in the same GNN-based encoder to capture and contrast similarity and dissimilarity of nodes with their neighborhood.
% This allows achieving state-of-the-art under heterophily.

% \subsection{Subgraph Update Interval}

% we conduct an ablation study on the interval between subgraph updates. The results are shown in Fig. 5. We see that frequent updates (T=10) or no updates can both cause performance drop on both homophily and heterophily graphs. Not updating the subgraphs leads to overfitting their inaccuracies, and updating them too frequently does not allow aggregating and learning the information effectively. A moderate amount of updates yields best performance.



% \begin{table}[ht]
% \centering
% \begin{tabular}{lcccc}
% \hline
% Dataset    & T=10  & T=50   & T=250  & No Update \\
% \hline
% cora       & 80.51 & \textbf{83.46} & 83.08 & 82.13 \\
% chameleon  & 41.57 & \textbf{48.31} & 48.31 & 42.70 \\
% \hline
% \end{tabular}
% \captionof{figure}{Performance comparison for different update intervals}\label{fig:ablation_T}
% \end{table}
% \subsection{Impact of edge proportion on \alg performance}

% Regarding the proportion of edges used for each view, we conducted a detailed study to explore the impact of varying $k_1$ on the performance. As observed, different $k_1$ (and $k_2$) values can have a significant influence on the performance. The performance of the Cora dataset varies by 30\% with different $k$ values, while the performance of the Chameleon dataset varies by 7\%. Based on the results, the homophily ratio of the graph is a good indicator of the appropriate $k_1$ (and $k_2$) values. In practice, one can sample a small subgraph and measure its homophily ratio for easier tuning.

% % \begin{table}[ht]
% % \centering
% % \caption{Performance Variation with Different $k_1$ Values}
% % \begin{tabular}{lccccc}
% % \hline
% % Dset & 0.9 & 0.8 & 0.5 & 0.2 & 0.1\\
% % \hline
% % Cora       & 83.46 & 80.15 & 72.06 & 54.78 & 53.67 \\
% % Cham  & 41.95 & 41.94 & 41.57 & 48.31 & 44.95 \\
% % \hline
% % \end{tabular}
% % \end{table}
% \clearpage
\section{Proof}\label{sec:proof}
\newtheorem{assumption}{Assumption}
\begin{assumption} \label{assump:1}
Let \(\pmb{X}\) be the feature matrix of \(\mathcal{G}^{\text{hom}}\) and \(\pmb{W}\) be the learnable weights of the GNN encoder. Then,
\[
\pmb{X} \pmb{W} \pmb{W} \pmb{X} = w_0 + w_1 \pmb{A}^{\text{hom}} + w_2 (\pmb{A}^{\text{hom}})^2 + \dots + w_j (\pmb{A}^{\text{hom}})^j.
\]
\end{assumption}

$\pmb{XWWX}$ under homophily captures the similarities of features between every two nodes in the subgraph after passing through the low-pass graph filter. Assumptions \ref{assump:1} aims to expand $\pmb{XWWX}$ with the weighted sum of different orders of $\pmb{A}$. Here, $w_i$ s are the weights of different orders of $\pmb{A}$. That is $w_i$ is the weight of $i$-th order of $\pmb{A}$, representing the number of length-$i$ paths between nodes $i$ and $j$ in its $(i,j)$ entry. For homophilic subgraphs, which adhere to the homophily principle, the weights for closer-hop connections (represented by $\pmb{A}$, $\pmb{A}^2$, etc.) are higher, since the closer the nodes are, the more similar they are. This is based on the homophily principle \citep{mcpherson2001birds,luan2020complete}. This principle suggests that, in homophily graphs, nodes within closer neighborhoods exhibit greater feature similarities. After projection, the similarities also become higher \citep{zhang2018arbitrary}.

\begin{assumption} \label{assump:2}
Let \(\pmb{X}\) be the feature matrix of \(\mathcal{G}^{\text{het}}\) and \(\pmb{W}\) be the learnable weights of the GNN encoder. Then,
\[
\pmb{X} \pmb{W} \pmb{W} \pmb{X} = w_0 + w_1 \pmb{L}^{\text{het}} + w_2 (\pmb{L}^{\text{het}})^2 + \dots + w_j (\pmb{L}^{\text{het}})^j.
\]
\end{assumption}

$\pmb{XWWX}$ under heterophily captures the dissimilarities of features between every two nodes in the subgraph after passing through the high-pass graph filters. Assumptions \ref{assump:2} aims to expand $\pmb{XWWX}$ with the weighted sum of different orders of $\pmb{L}$. Here, %$w_i$ s are the weights of different orders of $\pmb{L}$. That is 
$w_i$ is the weight of $i$-th order of $\pmb{L}$. %, representing the number of length-$i$ paths between nodes $i$ and $j$ in its $(i,j)$ entry. 
In contrast to homophilic graphs, for heterophilic subgraphs, the closer the nodes are, the more dissimilar they are \citep{zhu2020deep}. 

\newtheorem{lemma}{Lemma}

\begin{lemma} \label{lemma:1}
Let \( \pmb{A} \) and \( \widetilde{\pmb{A}} \) be adjacency matrices of the target graph and its augmented counterpart. Suppose that \( \pmb{A} \) and \( \widetilde{\pmb{A}} \) have the same eigenspaces, and let \( \pmb{D} \) and \( \widetilde{\pmb{D}} \) be the corresponding degree matrices, where \( \pmb{D} = \widetilde{\pmb{D}} \). Then the Laplacian matrices \( \pmb{L} \) and \( \widetilde{\pmb{L}} \) have the same eigenspaces.
\end{lemma}
\begin{proof}
Given that \( \pmb{A} \) and \( \widetilde{\pmb{A}} \) have the same eigenspaces, there exists an orthogonal matrix \( \pmb{Q} \) such that:
\[ \pmb{A} = \pmb{Q} \pmb{\Lambda} \pmb{Q}^T \quad \text{and} \quad \widetilde{\pmb{A}} = \pmb{Q} \widetilde{\pmb{\Lambda}} \pmb{Q}^T \]
where \( \pmb{\Lambda} \) and \( \widetilde{\pmb{\Lambda}} \) are diagonal matrices containing the eigenvalues of \( \pmb{A} \) and \( \widetilde{\pmb{A}} \), respectively.
Since \( \pmb{D} = \widetilde{\pmb{D}} \), let \( \pmb{D} = \widetilde{\pmb{D}} \).
The Laplacian matrices are defined as:
\[ \pmb{L} = \pmb{D} - \pmb{A} \quad \text{and} \quad \widetilde{\pmb{L}} = \pmb{D} - \widetilde{\pmb{A}} \]
Substituting the spectral decompositions of \( \pmb{A} \) and \( \widetilde{\pmb{A}} \), we have:
\[ \pmb{L} = \pmb{D} - \pmb{Q} \pmb{\Lambda} \pmb{Q}^T \]
\[ \widetilde{\pmb{L}} = \pmb{D} - \pmb{Q} \widetilde{\pmb{\Lambda}} \pmb{Q}^T \]
Both \( \pmb{L} \) and \( \widetilde{\pmb{L}} \) can be written as:
\[ \pmb{L} = \pmb{Q} (\pmb{Q}^T \pmb{D} \pmb{Q} - \pmb{\Lambda}) \pmb{Q}^T \]
\[ \widetilde{\pmb{L}} = \pmb{Q} (\pmb{Q}^T \pmb{D} \pmb{Q} - \widetilde{\pmb{\Lambda}}) \pmb{Q}^T \]
Since \( \pmb{D} \) is diagonal, \( \pmb{Q}^T \pmb{D} \pmb{Q} \) remains a diagonal matrix (as the orthogonal transformation of a diagonal matrix preserves diagonal structure). Let \( \pmb{D}' = \pmb{Q}^T \pmb{D} \pmb{Q} \), then:
\[ \pmb{L} = \pmb{Q} (\pmb{D}' - \pmb{\Lambda}) \pmb{Q}^T \]
\[ \widetilde{\pmb{L}} = \pmb{Q} (\pmb{D}' - \widetilde{\pmb{\Lambda}}) \pmb{Q}^T \]
The eigenvalues of \( \pmb{L} \) and \( \widetilde{\pmb{L}} \) are given by the diagonal entries of \( \pmb{D}' - \pmb{\Lambda} \) and \( \pmb{D}' - \widetilde{\pmb{\Lambda}} \), respectively. Since \( \pmb{Q} \) is the same for both \( \pmb{L} \) and \( \widetilde{\pmb{L}} \), they have the same eigenspaces.
Thus, \( \pmb{L} \) and \( \widetilde{\pmb{L}} \) have the same eigenspaces.
\end{proof}

\subsection{Theorem \ref{the:major} [\alg: Spectral Invariance]}
% \begin{theorem*}
Given a graph \( \mathcal{G} \), we infer a homophilic and a heterophilic subgraph from it, denoted as \( \mathcal{G}_{\text{hom}} \) and \( \mathcal{G}_{\text{het}} \), respectively. Their augmented counterparts are denoted as \( \tilde{\mathcal{G}}_{\text{hom}} \) and \( \tilde{\mathcal{G}}_{\text{het}} \). For graph augmentations, we follow \citep{liu2022revisiting}, where the adjacency matrix of the homophilic subgraph and the augmented homophilic subgraph share the same eigenspaces (\(\pmb{A}_{\text{hom}}\) and \( \tilde{\pmb{A}}_{\text{hom}} \)). Similarly, the adjacency matrix of the heterophilic subgraph and the augmented heterophilic subgraph share the same eigenspaces (\(\pmb{A}_{\text{het}}\) and \( \tilde{\pmb{A}}_{\text{het}} \)). By Lemma \ref{lemma:1}, the Laplacian matrix of the homophilic subgraph and the augmented homophilic subgraph share the same eigenspaces (\(\pmb{L}_{\text{hom}}\) and \( \tilde{\pmb{L}}_{\text{hom}} \)), and the Laplacian matrix of the heterophilic subgraph and the augmented heterophilic subgraph share the same eigenspaces (\(\pmb{L}_{\text{het}}\) and \( \tilde{\pmb{L}}_{\text{het}} \)).

We establish the following lower bound:

\[
\mathcal{L}_\text{\alg} \geq \frac{-1 - N}{2} \sum_i \left( \alpha_{\pmb{A}_i} \left(2 - (\lambda_{\pmb{A}_{i}^{\text{hom}}} - \lambda_{\tilde{\pmb{A}}_{i}^{\text{hom}}})^2 \right) + \alpha_{\pmb{L}_i} \left(4 - (\lambda_{\pmb{L}_{i}^{\text{het}}} - \lambda_{\tilde{\pmb{L}}_{i}^{\text{het}}})^2 \right) \right)
\]

where \(\lambda_{\pmb{A}^{\text{hom}}}\) and \(\lambda_{\pmb{L}^{\text{het}}}\) denote the eigenvalues of the homophilic subgraph low-pass filter and the heterophilic subgraph high-pass filter, respectively, and \(\alpha_{\pmb{A}^{\text{hom}}}\) and \(\alpha_{\pmb{L}^{\text{het}}}\) denote the adaptive weights for the \(i\)-th adjacency and Laplacian matrix components. 
\begin{proof}
{By minimizing the \alg loss, we minimize the losses for contrasting augmented views of both heterophilic and homophilc subgraphs. Hence we discuss each in our proof.}

{For simplification, since the \alg loss is symmetric, we only choose one graph view as the anchor view.}

\begin{align}
\mathcal{L} &= -\frac{1}{2N} \sum_{i=1}^N (l(\pmb{z}_l^i, \tilde{\pmb{z}}_l^i) + l(\pmb{z}_h^i, \tilde{\pmb{z}}_h^i)) \\
&= -\frac{1}{2N} \sum_{i=1}^N (\log \frac{e^{\pmb{z}_l^i \tilde{\pmb{z}}_l^i{}^T}}{
e^{\pmb{z}_h^i{\tilde{\pmb{z}}_h^i{}}^T}
+ \sum_{\substack{k\in[N],)\\k \neq i}} e^{\pmb{z}_l^i \tilde{\pmb{z}}_l^k{}^T}} + \log \frac{e^{ \pmb{z}_h^i \tilde{\pmb{z}}_h^i{}^T}}{
e^{\pmb{z}_h^i \tilde{\pmb{z}}_h^i{}^T}
+ \sum_{\substack{k\in[N],\\k \neq i}} e^{\pmb{z}_h^i \tilde{\pmb{z}}_h^k{}^T}}) \\
&= -\frac{1}{2N} \sum_{i=1}^N  (\pmb{z}_l^i {\tilde{\pmb{z}}_l^i{}}^T + \pmb{z}_h^i {\tilde{\pmb{z}}_h^i{}}^T - \log \sum_{k}^N e^{\pmb{z}_l^i \tilde{\pmb{z}}_l^k{}^T} - \log \sum_{k}^N e^{\pmb{z}_h^i \tilde{\pmb{z}}_h^k{}^T}) \\
&\geq -\frac{1}{2N} \sum_{i=1}^N (\pmb{z}_l^i {\tilde{\pmb{z}}_l^i{}}^T + \pmb{z}_h^i {\tilde{\pmb{z}}_h^i{}}^T - \log N \cdot e^{\sum_{k}^N \pmb{z}_l^i \tilde{\pmb{z}}_l^k{}^T / {N}} - \log N \cdot e^{ \sum_{k}^N \pmb{z}_h^i \tilde{\pmb{z}}_h^k{}^T /N}) \\
& \equiv -\sum_{i=1}^N (\pmb{z}_l^i {\tilde{\pmb{z}}_l^i{}}^T + \pmb{z}_h^i {\tilde{\pmb{z}}_h^i{}}^T - \frac{1}{N} \sum_{N}{\pmb{z}_l^i {\tilde{\pmb{z}}_l^i{}}^T + \pmb{z}_h^i {\tilde{\pmb{z}}_h^i{}}^T}) \\
& = -(tr(\pmb{Z}_l \tilde{\pmb{Z}_l}^T) + tr(\pmb{Z}_h \tilde{\pmb{Z}_h}^T) - \frac{1}{N}sum(\pmb{Z}_l \tilde{\pmb{Z}_l}^T) - \frac{1}{N}sum(\pmb{Z}_h \tilde{\pmb{Z}_h}^T))\label{eq:tr}
\end{align}
\(\pmb{Z}_l\) is the projected representation of \(\mathcal{G}_{\text{hom}}\), \(\tilde{\pmb{Z}}_l\) is the projected representation of \(\tilde{\mathcal{G}}_{\text{hom}}\), \(\pmb{Z}_h\) is the projected representation of \(\mathcal{G}_{\text{het}}\), and \(\tilde{\pmb{Z}}_h\) is the projected representation of \(\tilde{\mathcal{G}}_{\text{het}}\).
As mentioned before, \(\pmb{A}_{\text{hom}}\) and \(\tilde{\pmb{A}}_{\text{hom}}\) share the same eigenspaces, so we have that \(\pmb{A}_{\text{hom}} = \pmb{Q}_{\text{hom}} \Lambda_{\text{hom}} \pmb{Q}_{\text{hom}}^T\) and \(\tilde{\pmb{A}}_{\text{hom}} = \pmb{Q}_{\text{hom}} \tilde{\pmb{\Lambda}}_{\text{hom}} \pmb{Q}_{\text{hom}}^T\), where \(\pmb{Q}_{\text{hom}}\) is the collection of eigenspaces, and \(\Lambda_{\text{hom}} = \text{diag}(\lambda_{\pmb{A}_{i}^{hom}})\) and \(\tilde{\pmb{\Lambda}}_{\text{hom}} = \text{diag}(\lambda_{\tilde{\pmb{A}}_{i}^{hom}})\) are their diagonal weight matrices. Similarly, \(\pmb{A}_{\text{het}} = \pmb{Q}_{\text{het}} \Lambda_{\text{het}} \pmb{Q}_{\text{het}}^T\) and \(\tilde{\pmb{A}}_{\text{het}} = \pmb{Q}_{\text{het}} \tilde{\pmb{\Lambda}}_{\text{het}} \pmb{Q}_{\text{het}}^T\), where \(\pmb{Q}_{\text{het}}\) is the collection of eigenspaces, and \(\pmb{\Lambda}_{\text{het}} = \text{diag}(\lambda_{\pmb{L}_{i}^{het}})\) and \(\tilde{\pmb{\Lambda}}_{\text{het}} = \text{diag}(\lambda_{\tilde{\pmb{L}}_{i}^{het}})\). With the simplification of the \alg loss, we have \(\pmb{Z}_h \tilde{\pmb{Z}_h}^T = \pmb{LXWWX\tilde{L}}\) and \(\pmb{Z}_l \tilde{\pmb{Z}_l}^T = \pmb{AXWWX\tilde{A}}\), where \(W\) is learnable parameters of the encoder.

\begin{lemma}\label{lemma:2}
With assumption \ref{assump:1}, for homophilic subgraph \(\mathcal{G}^{\text{hom}}\), when \( j \geq N-1 \), \(\pmb{XWWX} = w_0 + w_1\pmb{A}^{hom} + w_2(\pmb{A}^{hom})^2 + \dots + w_j(\pmb{A}^{hom})^j = \pmb{Q}_{\text{hom}} \mathrm{\pmb{A}_{\text{hom}}} \pmb{Q}_{\text{hom}}^T\), where \(\mathrm{\pmb{A}_{\text{hom}}} = \text{diag}(\alpha_{\pmb{A}_1}\ldots \alpha_{\pmb{A}_N})\). \(\alpha_{\pmb{A}_1} \ldots \alpha_{\pmb{A}_N}\) are \( N \) different parameters, if \(\lambda_{\pmb{A}_{1}^{hom}} \ldots \lambda_{\pmb{A}_{N}^{hom}}\) are \( N \) different frequency amplitudes.
\end{lemma}
\begin{proof}
The proof can be found in Theorem 4 of \citep{liu2022revisiting}.
\end{proof}
\begin{lemma}\label{lemma:3}
With assumption \ref{assump:2}, for heterophilic subgraph \(\mathcal{G}^{\text{het}}\), when \( j \geq N-1 \), \(\pmb{XWWX} =  w_0 + w_1 \pmb{L}^{\text{het}} + w_2 (\pmb{L}^{\text{het}})^2 + \dots + w_j (\pmb{L}^{\text{het}})^j = \pmb{Q}_{\text{het}} \mathrm{\pmb{A}_{\text{het}}} \pmb{Q}_{\text{het}}^T\), where \(\mathrm{\pmb{A}_{\text{het}}} = \text{diag}(\alpha_{\pmb{L}_1}\ldots \alpha_{\pmb{L}_N})\). \(\alpha_{\pmb{L}_1}\ldots \alpha_{\pmb{L}_N}\) are \( N \) different parameters, if \(\lambda_{\pmb{A}_{1}^{het}} \ldots \lambda_{\pmb{A}_{N}^{het}}\) are \( N \) different frequency amplitudes.
\end{lemma}

\begin{proof}
The proof can be found in Theorem 4 of \citep{liu2022revisiting}, by replacing $\pmb{L}$ as the decomposing matrix. 
\end{proof}
For \( \pmb{Z}_l \tilde{\pmb{Z}_l}^T \),
using Lemma \ref{lemma:2}, we have:

\[
\begin{aligned}
\pmb{Z}_l \tilde{\pmb{Z}_l}^T &= \pmb{AXWWX\tilde{A}} \\
&= \pmb{Q}_{\text{hom}} \pmb{\Lambda}_{\text{hom}} \pmb{Q}_{\text{hom}}^T \pmb{Q}_{\text{hom}} \mathrm{\pmb{A}_{\text{hom}}} \pmb{Q}_{\text{hom}}^T \pmb{Q}_{\text{hom}} \tilde{\pmb{\Lambda}}_{\text{hom}} \pmb{Q}_{\text{hom}}^T \\
&= \pmb{Q}_{\text{hom}} \pmb{\Lambda}_{\text{hom}}\mathrm{\pmb{A}_{\text{hom}}}\tilde{\pmb{\Lambda}}_{\text{hom}}\pmb{Q}_{\text{hom}}^T \\
&= \pmb{Q}_{\text{hom}} \begin{bmatrix}
\lambda_{\pmb{A}_{1}^{\text{hom}}}\alpha_{\pmb{A}_1}\lambda_{\tilde{\pmb{A}}_{1}^{hom}} & 0 & \cdots & 0 \\
0 &\lambda_{\pmb{A}_{2}^{\text{hom}}}\alpha_{\pmb{A}_2}\lambda_{\tilde{\pmb{A}}_{2}^{hom}} & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & \lambda_{\pmb{A}_{N}^{\text{hom}}}\alpha_{\pmb{A}_N}\lambda_{\tilde{\pmb{A}}_{N}^{\text{hom}}}
\end{bmatrix} \pmb{Q}_{\text{hom}}^T \\
&= \sum_{i=1}^N \lambda_{\pmb{A}_{i}^{\text{hom}}} \alpha_{\pmb{A}_i} \lambda_{\tilde{\pmb{A}}_{i}^{\text{hom}}} q_{\pmb{A}_i} q_{\pmb{A}_i}^T,
\end{aligned}
\]
where $q_{\pmb{A}_i}$ is the $i^{th}$ column of the matrix \(\pmb{Q}_{\text{hom}}\). 

For \( \pmb{Z}_h \tilde{\pmb{Z}_h}^T \),
using Lemma \ref{lemma:3}, we have:
\[
\begin{aligned}
\pmb{Z}_h \tilde{\pmb{Z}_h}^T &= \pmb{LXWWX\tilde{L}} \\
&= \pmb{Q}_{\text{het}} \pmb{\Lambda}_{\text{het}} \pmb{Q}_{\text{het}}^T \pmb{Q}_{\text{het}} \mathrm{\pmb{A}_{\text{het}}} \pmb{Q}_{\text{het}}^T \pmb{Q}_{\text{het}} \tilde{\pmb{\Lambda}}_{\text{het}} \pmb{Q}_{\text{het}}^T \\
&= \pmb{Q}_{\text{het}} \pmb{\Lambda}_{\text{het}}\mathrm{\pmb{A}_{\text{het}}}\tilde{\pmb{\Lambda}}_{\text{het}}\pmb{Q}_{\text{het}}^T \\
&= \pmb{Q}_{\text{het}} \begin{bmatrix}
\lambda_{\pmb{L}_{1}^{\text{het}}}\alpha_{\pmb{L}_1}\lambda_{\tilde{\pmb{L}}_{1}^{\text{het}}} & 0 & \cdots & 0 \\
0 & \lambda_{\pmb{L}_{2}^{\text{het}}}\alpha_{\pmb{L}_2}\lambda_{\tilde{\pmb{L}}_{2}^{\text{het}}} & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & \lambda_{\pmb{L}_{N}^{\text{het}}}\alpha_{\pmb{L}_N}\lambda_{\tilde{\pmb{L}}_{N}^{\text{het}}}
\end{bmatrix} \pmb{Q}_{\text{het}}^T \\
&= \sum_{i=1}^N \lambda_{\pmb{L}_{i}^{\text{het}}} \alpha_{\pmb{L}_i} \lambda_{\tilde{\pmb{L}}_{i}^{\text{het}}} q_{\pmb{L}_i} q_{\pmb{L}_i}^T,
\end{aligned}
\]
where $q_{\pmb{L}_i}$ is the $i^{th}$ column of the matrix \(\pmb{Q}_{\text{het}}\). Therefore, we have:
\[
\text{tr}(\pmb{Z}_l \tilde{\pmb{Z}_l}^T) = \sum_{i=1}^N \lambda_{\pmb{A}_{i}^{\text{hom}}} \alpha_{\pmb{A}_i} \lambda_{\tilde{\pmb{A}}_{i}^{\text{hom}}}, \quad {sum}(\pmb{Z}_l \tilde{\pmb{Z}_l}^T) = \sum_{i} \lambda_{\pmb{A}_{i}^{\text{hom}}} \alpha_{\pmb{A}_i} \lambda_{\tilde{\pmb{A}}_{i}^{\text{hom}}} {sum}(q_{\pmb{A}_i} q_{\pmb{A}_i}^T)
\]
\[
\text{tr}(\pmb{Z}_h \tilde{\pmb{Z}_h}^T) = \sum_{i=1}^N \lambda_{\pmb{L}_{i}^{\text{het}}} \alpha_{\pmb{L}_i} \lambda_{\tilde{\pmb{L}}_{i}^{het}}, \quad {sum}(\pmb{Z}_h \tilde{\pmb{Z}_h}^T) = \sum_{i} \lambda_{\pmb{L}_{i}^{\text{het}}} \alpha_{\pmb{L}_i} \lambda_{\tilde{\pmb{L}}_{i}^{het}} {sum}(q_{\pmb{L}_i} q_{\pmb{L}_i}^T)
\]
By substituting this into Eq. \eqref{eq:tr}, we have 
\begin{align*}
\mathcal{L}_{HLCL} &\geq -\left(\sum_{i=1}^N \left(\lambda_{\pmb{A}_{i}^{\text{hom}}} \alpha_{\pmb{A}_i} \lambda_{\tilde{\pmb{A}}_{i}^{\text{hom}}} + \lambda_{\pmb{L}_{i}^{\text{het}}} \alpha_{\pmb{L}_i} \lambda_{\tilde{\pmb{L}}_{i}^{\text{het}}}\right) - \frac{1}{N}\sum_{i=1}^N \left(\lambda_{\pmb{A}_{i}^{\text{hom}}} \alpha_{\pmb{A}_i} \lambda_{\tilde{\pmb{A}}_{i}^{\text{hom}}} \sum (q_{\pmb{A}_i} q_{\pmb{A}_i}^T) + \lambda_{\pmb{L}_{i}^{\text{het}}} \alpha_{\pmb{L}_i} \lambda_{\tilde{\pmb{L}}_{i}^{\text{het}}} \sum (q_{\pmb{L}_i} q_{\pmb{L}_i}^T)\right)\right) \\
&= -\left(\sum_{i=1}^N \lambda_{\pmb{A}_{i}^{\text{hom}}} \alpha_{\pmb{A}_i} \lambda_{\tilde{\pmb{A}}_{i}^{\text{hom}}} \left(1 - \frac{1}{N}\sum (q_{\pmb{A}_i} q_{\pmb{A}_i}^T)\right) + \lambda_{\pmb{A}_{i}^{\text{het}}} \alpha_{\pmb{L}_i} \lambda_{\tilde{\pmb{A}}_{i}^{\text{het}}} \left(1 - \frac{1}{N} \sum (q_{\pmb{L}_i} q_{\pmb{L}_i}^T)\right)\right) \\
\intertext{Since \(q_i^T q_i = 1\),  \(|q_{ij}| < 1\), \(\sum (q_{i} q_{i}^T) > -N^2\), we have}
\mathcal{L}_{HLCL}&\geq (-1-N) \sum_{i=1}^N \left( \lambda_{\pmb{A}_{i}^{\text{hom}}} \alpha_{\pmb{A}_i} \lambda_{\tilde{\pmb{A}}_{i}^{\text{hom}}} + \lambda_{\pmb{A}_{i}^{\text{het}}} \alpha_{\pmb{L}_i} \lambda_{\tilde{\pmb{A}}_{i}^{\text{het}}} \right). \\
\intertext{Since \(\lambda_{\pmb{A}_{i}^{\text{hom}}} \in (-1,1]\), and \(\lambda_{\pmb{L}_{i}^{\text{het}}} \in [0,2)\), we have}
\mathcal{L}_{HLCL}&\geq \frac{-1-N}{2} \sum_{i=1}^N \left( \alpha_{\pmb{A}_i} \left(2 - (\lambda_{\pmb{A}_{i}^{\text{hom}}} - \lambda_{\tilde{\pmb{A}}_{i}^{\text{hom}}})^2\right) + \alpha_{\pmb{L}_i} \left(4 - (\lambda_{\pmb{L}_{i}^{\text{het}}} - \lambda_{\tilde{\pmb{L}}_{i}^{\text{het}}})^2\right)\right).
\end{align*}

\end{proof}
