Consistency-diversity-realism Pareto fronts of conditional image generative models

Published: 10 Oct 2024, Last Modified: 04 Dec 2024NeurIPS 2024 Workshop RBFM OralEveryoneRevisionsBibTeXCC BY-NC-ND 4.0
Keywords: Image generative models, world models
Abstract: Building world models that accurately and comprehensively represent the real world is a holy grail for image generative models as it would enable their use as world simulators. For conditional image generative models to be successful world models, they should not only excel at image quality and prompt-image consistency but also ensure high representation diversity. However, current research in generative models mostly focuses on creative applications that are predominantly concerned with human preferences of image quality and aesthetics. We note that generative models have inference time mechanisms – or knobs – that allow the control of generation consistency, quality, and diversity. In this paper, we use state-of-the-art text-to-image and their knobs to draw consistency-diversity-realism Pareto fronts that provide a holistic view on consistency-diversity-realism multi-objective. Our experiments suggest that realism and consistency can both be improved simultaneously; however there exists a clear tradeoff between realism/- consistency and diversity. By looking at Pareto optimal points, we note that earlier models are better at representation diversity and worse in consistency-realism, and more recent models excel in consistency-realism while decreasing significantly the representation diversity. Overall, our analysis clearly shows that there is no best model and the choice of model should be determined by the downstream application. With this analysis, we invite the research community to consider Pareto fronts as an analytical tool to measure progress towards world models.
Submission Number: 41
Loading