Model-Agnostic Graph Dataset Compression with the Tree Mover’s Distance

Mika Sarkin Jain; Stefanie Jegelka; Ishani Karmarkar; Luana Ruiz; Ellen Vitercik

Model-Agnostic Graph Dataset Compression with the Tree Mover’s Distance

Mika Sarkin Jain, Stefanie Jegelka, Ishani Karmarkar, Luana Ruiz, Ellen Vitercik

Published: 18 Jun 2024, Last Modified: 10 Jul 2024WANT@ICML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: graph neural networks, tree mover's distance, graph classification

TL;DR: We present new approaches for graph dataset compression for efficient graph learning using the tree mover's distance.

Abstract: Graph neural networks have demonstrated remarkable success across a variety of domains. However, the acquisition and management of largescale graph datasets poses several challenges. Acquiring graph-level labels can be prohibitively costly, especially for applications in the biosciences and combinatorial optimization. Storage and privacy constraints can pose additional challenges. In this work, we propose an approach for data subset selection for graph datasets, which downsamples graphs and nodes based on the Tree Mover’s Distance. We provide new efficient methods for computing the TMD in our setting; empirical results showing our approach outperforms other node and graph sampling methods; and theoretical results bounding the decrease in accuracy caused by training on the downsampled graphs. Surprisingly, we find that with our method, we can subsample down to 1% of the number of graphs and 10% of the number of nodes on some datasets, with minimal degradation in model accuracy.

Submission Number: 44

Loading