RankGen: A Statistically Robust Framework for Ranking Generative Models Using Classifier-Based Metrics

ICLR 2026 Conference Submission18207 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Generative models, Model evaluation, PAC-style bounds, Classifier-based probes, Evaluation Metrics
TL;DR: RankGen is a statistically robust framework for evaluating generative models via four classifier-based metrics with PAC-style guarantees, yielding interpretable rankings and exposing failure modes such as memorization that standard metrics overlook.
Abstract: Standard metrics for evaluating generative models are brittle, easy to game, and often ignore task relevance. We introduce RankGen, a unified evaluation framework built on four metrics: Quality, Utility, Indistinguishability, and Similarity; each designed to capture a distinct failure mode and supported by PAC-style generalization bounds. RankGen follows a two-stage process: models that violate bounds are discarded, while the rest are ranked using robust, quantile-based summaries. The resulting composite score, Exchangeability, captures both fidelity and task relevance. By exposing hidden pathologies such as memorization, RankGen provides a principled foundation for safer model selection and deployment.
Primary Area: generative models
Submission Number: 18207
Loading