Abstract: Diffusion models now generate high-quality, diverse samples, with an increasing focus on more powerful models. Although ensembling is a well-known way to improve supervised models, its application to unconditional score-based diffusion models remains largely unexplored. In this work we investigate whether it provides tangible benefits for generative modelling. We find that while ensemble generally improves the score-matching loss and model likelihood, it fails to consistently enhance perceptual quality metrics such as FID. Our study spans across a breadth of aggregation rules using Deep Ensembles, Monte Carlo Dropout, and Random Forests on CIFAR-10, FFHQ, and tabular data. We attempt to explain this discrepancy by investigating possible explanations, such as the link between score estimation and image quality. Finally, we provide theoretical insights into the summing of score models, which shed light not only on ensembling but also on several model composition techniques (e.g. guidance).
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Sylvain_Le_Corff1
Submission Number: 5530
Loading