Extending Graph Condensation to Multi-Label Datasets: A Benchmark Study

Liangliang Zhang; Haoran Bao; Yao Ma

Extending Graph Condensation to Multi-Label Datasets: A Benchmark Study

Liangliang Zhang, Haoran Bao, Yao Ma

Published: 31 May 2025, Last Modified: 31 May 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: As graph data grows increasingly complicated, training graph neural networks (GNNs) on large-scale datasets presents significant challenges, including computational resource constraints, data redundancy, and transmission inefficiencies. While existing graph condensation techniques have shown promise in addressing these issues, they are predominantly designed for single-label datasets, where each node is associated with a single class label. However, many real-world applications, such as social network analysis and bioinformatics, involve multi-label graph datasets, where one node can have various related labels. To deal with this problem, we extend traditional graph condensation approaches to accommodate multi-label datasets by introducing modifications to synthetic dataset initialization and condensing optimization. Through experiments on eight real-world multi-label graph datasets, we prove the effectiveness of our method. In the experiment, the GCond framework, combined with K-Center initialization and binary cross-entropy loss (BCELoss), generally achieves the best performance. This benchmark for multi-label graph condensation not only enhances the scalability and efficiency of GNNs for multi-label graph data but also offers substantial benefits for diverse real-world applications.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Chuxu_Zhang2

Submission Number: 3931

Loading