Tackling Provably Hard Representative Selection viaGraph Neural Networks

Published: 20 Jun 2023, Last Modified: 11 Oct 2023SODS 2023 OralEveryoneRevisionsBibTeX
Keywords: Hardness of Representative Selection, Active Learning, Data Summarization, Attributed Graphs, Graph Neural Networks
TL;DR: We 1) provide hardness results for representative selection, 2) show that a graph context can make hard problems solvable, and 3) develop an approach for representative selection based on graph neural networks.
Abstract: Representative Selection (RS) is the problem of finding a small subset of exemplars from a dataset that is representative of the dataset. In this paper, we study RS for unlabeled datasets and focus on finding representatives that optimize the accuracy of a model trained on the selected representatives. Theoretically, we establish a new hardness result for RS by proving that a particular, highly practical variant of it (RS for Learning) is hard to approximate in polynomial time within any reasonable factor, which implies a significant potential gap between the optimum solution of widely-used surrogate functions and the actual accuracy of the model. We then study a setting where additional information in the form of a (homophilous) graph structure is available, or can be constructed, between the data points. We show that with an appropriate modeling approach, the presence of such a structure can turn a hard RS (for learning) problem into one that can be effectively solved. To this end, we develop RSGNN, a representation learning-based RS model based on Graph Neural Networks. Empirically, we demonstrate the effectiveness of RSGNN on problems with predefined graph structures as well as problems with graphs induced from node feature similarities, by showing that RSGNN achieves significant improvements over established baselines on a suite of eight benchmarks.
Submission Number: 21
Loading