UniPROT: Uniform Prototype Selection via Partial Optimal Transport with Submodular Guarantees

Prateek Chanda; Prayas Agrawal; Karthik S. Gurumoorthy; Ganesh Ramakrishnan; Bamdev Mishra; Pratik Jawanpuria

UniPROT: Uniform Prototype Selection via Partial Optimal Transport with Submodular Guarantees

Prateek Chanda, Prayas Agrawal, Karthik S. Gurumoorthy, Ganesh Ramakrishnan, Bamdev Mishra, Pratik Jawanpuria

19 Sept 2025 (modified: 03 Oct 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: partial optimal transport, submodular maximization, subset selection, imbalanced classification

Abstract: Selecting prototypical examples from a source distribution to represent a target data distribution is a fundamental problem in machine learning. Existing subset selection methods often rely on implicit importance scores, which can be skewed towards majority classes and lead to low-quality prototypes for minority classes. We introduce UniPROT, a novel subset selection framework that minimizes the optimal transport (OT) distance between a uniformly weighted prototypical distribution and the target distribution. While intuitive, this formulation leads to a cardinality constrained super-additive maximization problem that is challenging to approximate efficiently. To address this, we propose a principled relaxation of the OT marginal constraints, yielding a partial optimal transport-based submodular objective. We prove that this relaxation is tight and enables a greedy algorithm with a \((1 - \frac{1}{e})\) approximation guarantee relative to the original sub-additive maximization problem. Empirically, we showcase that enforcing uniform prototype weights in UniPROT consistently improves minority-class representation in imbalanced classification benchmarks without compromising majority-class accuracy. In both finetuning and pretraining regimes for large language models under domain imbalance, UniPROT enforces uniform source contributions, yielding robust performance gains. Our results establish UniPROT as a scalable, theoretically grounded solution for uniform-weighted prototype selection.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 20947

Loading