RankGen: A Statistically Robust Framework for Ranking Generative Models Using Classifier-Based Metrics

RankGen: A Statistically Robust Framework for Ranking Generative Models Using Classifier-Based Metrics

ICLR 2026 Conference Submission18207 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Generative models, Model evaluation, PAC-style bounds, Classifier-based probes, Evaluation Metrics

TL;DR: RankGen is a statistically robust framework for evaluating generative models via four classifier-based metrics with PAC-style guarantees, yielding interpretable rankings and exposing failure modes such as memorization that standard metrics overlook.

Abstract: Standard metrics for evaluating generative models are brittle, easy to game, and often ignore task relevance. We introduce RankGen, a unified evaluation framework built on four metrics: Quality, Utility, Indistinguishability, and Similarity; each designed to capture a distinct failure mode and supported by PAC-style generalization bounds. RankGen follows a two-stage process: models that violate bounds are discarded, while the rest are ranked using robust, quantile-based summaries. The resulting composite score, Exchangeability, captures both fidelity and task relevance. By exposing hidden pathologies such as memorization, RankGen provides a principled foundation for safer model selection and deployment.

Primary Area: generative models

Submission Number: 18207

Loading