
\section{Introduction}\label{sec:intro}

Graphs, as foundational mathematical structures, hold immense significance across diverse domains such as material science, finance, biology, and chemistry \citep{battaglia2018relational, wu2020comprehensive, zhou2020graph, bruna2013spectral, chen2020simple,defferrard2016convolutional}. Serving both as end goals and preprocessing tasks for various models, graphs play a pivotal role in representing and analyzing intricate relationships within datasets \citep{kumar2020unified}. Nevertheless, the ever-increasing size of datasets presents a formidable challenge, requiring substantial memory resources and computational power to execute downstream tasks effectively \citep{chen2022graph}. This growing scale underscores the critical need for innovative approaches and optimizations to harness the full potential of graph-based analyses in today's data-intensive landscape.

In response to these challenges, the landscape has seen the emergence of techniques like graph coarsening\citep{loukas2018spectrally, loukas2019graph, kumar2023unified, pmlr-v202-kumar23a, kumar2023unified, https://doi.org/10.48550/arxiv.1102.2950, https://doi.org/10.48550/arxiv.1004.1220, hendrickson1995multi}, graph condensation \citep{jin2021graph}, and graph summarization \citep{riondato2017graph}. These innovative approaches are designed to learn a smaller and more tractable graph while retaining the properties of the original graph. 

There exist various graph reduction techniques. The most recent are: \cite{loukas2018spectrally, loukas2019graph} are heuristic-based approaches, \cite{jin2021graph} is a deep learning-based technique, \cite{kumar2023unified} is an optimization-based framework. However, 
\cite{loukas2018spectrally, loukas2019graph} considers only the Laplacian of original graph while \cite{jin2021graph,kumar2023unified} considers Laplacian matrix as well as feature matrix of the original graph for learning a coarsened graph.

 These methods focus on learning a crucial mapping matrix to connect nodes in the original graph to supernodes in coarsened graphs. For a given original graph, multiple coarsened graphs can be generated. To assess the quality of a coarsened graph, the node profile matrix is introduced, as detailed in section \ref{nodelabelmatrix} \citep{ghoroghchian2021graph}. This matrix, relying on the mapping matrix and the one-hot label matrix of the original graph, is essential for achieving a well-balanced mapping. To ensure an optimal coarsened graph for downstream tasks, the node profile matrix of the coarsened graph ideally should exhibit maximum sparsity. However, existing graph coarsening methods are not able to learn coarsened graphs with sparse $\phi$ matrices, limiting their effectiveness for downstream tasks.

To enhance downstream task efficacy with coarsened graphs, achieving a sparse node profile matrix is crucial. In this paper, we propose an optimization-based method incorporating a function dependent on the mapping matrix C and a one hot matrix of some of the node labels of the original graph. The proposed formulation also includes Dirichlet energy and log determinant, constituting a non-convex optimization problem efficiently solvable through block successive upper bound minimization(BSUM) technique. We present the Label-Aware Graph Coarsening (LAGC) algorithm, updating variables iteratively, one at a time, while keeping others constant. Our algorithm is proven convergent, providing a robust and efficient solution to the optimization problem.

To demonstrate the efficacy of our algorithm, we applied it to a downstream task—specifically, node classification and link prediction using the coarsened graph. Utilizing the LAGC algorithm, we learned the coarsened graph, considering the graph matrix, feature matrix, and some of the node labels from the original graph. Subsequently, we trained a Graph Neural Network (GNN) using the learned coarsened graph. Testing was then conducted on the original graph. Notably, our results exhibited a substantial performance improvement over existing state-of-the-art methods, underscoring the superior capabilities of our proposed approach.

Our main contributions can be summarized as follows:
\begin{enumerate}
    \item This is the first optimization method that leverages the graph matrix, feature matrix, and label matrix of the original graph to learn a more informative coarsened graph, optimizing its suitability for downstream tasks.

    \item The proposed method is an efficiently solvable optimization technique utilizing  block successive upper bound minimization(BSUM) technique, updating one variable at a time while maintaining the other fixed. Additionally, the method is proven to be convergent.

    \item To demonstrate the effectiveness of our proposed algorithm, we conducted a downstream task, specifically node classification and link prediction. We trained a Graph Neural Network (GNN) using the coarsened graph, and testing was carried out on the original graphs. It is clear that our LAGC algorithm outperforms the state-of-the-art method significantly.
    \end{enumerate}
\subsection{Outline and Notation}
The paper is organized as follows: in Section 2, we present foundational background information covering graphs, graph learning from data, and graph coarsening techniques. Additionally, we introduce the proposed LAGC formulation in this section. Section 3 is dedicated to the development of our algorithm. Finally, in Section 4, we present the outcomes of our experiments conducted on real-world datasets. 
%We present additional experiments with proof of convergence of the (proposed) FGC algorithm in the supplementary material.
\\ 
In terms of notation, lower case (bold) letters denote scalars (vectors) and upper case letters denote matrices. The dimension of a matrix is omitted whenever it is clear from the context. The $(i,j)$-th entry of a matrix $X$ is denoted by $X_{ij}$. $X^\dagger$ and $X^\top$ denote the pseudo inverse and transpose of matrix $X$, respectively. $X_i$ and $[X^T]_j$ denote the $i$-th column and $j$-th row of matrix $X$. The all-zero and all-one vectors or matrices of appropriate sizes are denoted by $\bzero$ and $\bone$, respectively. The $\norm{X}_1$, $\norm{X}_F$, $\norm{X}_{1,2}$ denote the $\ell_1$-norm, Frobenius norm and $\ell_{1,2}$-norm of $X$, respectively. The Euclidean norm of the vector $X$ is denoted as $\norm{X}_{2}$. $\text{det}(X)$ is defined as the generalized determinant of a positive definite matrix $X$, i.e., the product of its non-zero eigenvalues. The inner product of two matrices is defined as $\langle X, Y\rangle=\text{tr}(X^\top Y)$, where $\text{tr}(\cdot)$ is the trace operator. $\mathbb{R}_+$ represents positive real numbers. The inner product of two vectors is defined as $\langle X_i, X_j\rangle=X_i^TX_j$ where $X_i$ and $X_j$ are the $i$-th and $j$-th column of matrix $X$.
  

