GIST: Greedy Independent Set Thresholding for Max-Min Diversification with Submodular Utility

Matthew Fahrbach; Srikumar Ramalingam; Morteza Zadimoghaddam; Sara Ahmadian; Gui Citovsky; Giulia DeSalvo

GIST: Greedy Independent Set Thresholding for Max-Min Diversification with Submodular Utility

Matthew Fahrbach, Srikumar Ramalingam, Morteza Zadimoghaddam, Sara Ahmadian, Gui Citovsky, Giulia DeSalvo

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: approximation algorithm, submodular maximization, max-min diversification, data sampling, subset selection

Abstract: This work studies a novel subset selection problem called *max-min diversification with monotone submodular utility* (MDMS), which has a wide range of applications in machine learning, e.g., data sampling and feature selection. Given a set of points in a metric space, the goal of MDMS is to maximize $f(S) = g(S) + \lambda \cdot \text{div}(S)$ subject to a cardinality constraint $|S| \le k$, where $g(S)$ is a monotone submodular function and $\text{div}(S) = \min_{u,v \in S : u \ne v} \text{dist}(u,v)$ is the *max-min diversity* objective. We propose the `GIST` algorithm, which gives a $\frac{1}{2}$-approximation guarantee for MDMS by approximating a series of maximum independent set problems with a bicriteria greedy algorithm. We also prove that it is NP-hard to approximate within a factor of $0.5584$. Finally, we show in our empirical study that `GIST` outperforms state-of-the-art benchmarks for a single-shot data sampling task on ImageNet.

Primary Area: Optimization (e.g., convex and non-convex, stochastic, robust)

Submission Number: 10383

Loading