Score-of-Mixture Training: One-Step Generative Model Training Made Simple via Score Estimation of Mixture Distributions

Tejas Jayashankar; Jongha Jon Ryu; Gregory W. Wornell

Score-of-Mixture Training: One-Step Generative Model Training Made Simple via Score Estimation of Mixture Distributions

Tejas Jayashankar, Jongha Jon Ryu, Gregory W. Wornell

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 spotlightposterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We introduce Score-of-Mixture Training, a simple and stable framework for training one-step generative models from scratch using the α-skew Jensen–Shannon divergence by estimating the scores of mixture distribution across multiple noise levels.

Abstract: We propose *Score-of-Mixture Training* (SMT), a novel framework for training one-step generative models by minimizing a class of divergences called the $\alpha$-skew Jensen–Shannon divergence. At its core, SMT estimates the score of mixture distributions between real and fake samples across multiple noise levels. Similar to consistency models, our approach supports both training from scratch (SMT) and distillation using a pretrained diffusion model, which we call *Score-of-Mixture Distillation* (SMD). It is simple to implement, requires minimal hyperparameter tuning, and ensures stable training. Experiments on CIFAR-10 and ImageNet 64×64 show that SMT/SMD are competitive with and can even outperform existing methods.

Lay Summary: Generative modeling enables the exploration of the statistical structures inherent in data by learning to produce rich, diverse, and realistic samples. In this paper, we develop a method for efficient one-step generative modeling, where high-quality samples are produced in a single model execution. Recently diffusion models have become popular for generation, but they require many iterative steps to transform noise into structure. Recent efforts to enable one-step generation typically rely on distilling such pre-trained diffusion models, an approach that is computationally expensive. Alternatives that train one-step models from scratch often suffer from instability or expensive simulation. We show that one-step generative models can be trained from scratch without costly pre-training or distillation. Our method centers on learning a model that estimates the gradient of the mixture distribution of real and generated data. Inspired by advances in diffusion modeling, we introduce a novel, stable, and efficient training scheme for one-step generation that is purely based on ensuring distributional overlap between real and generated samples using distribution matching principles from information theory.

Link To Code: https://github.com/tkj516/ score-of-mixture-training

Primary Area: Deep Learning->Generative Models and Autoencoders

Keywords: one-step generation, skew Jensen-Shannon Divergence, diffusion models, score estimation

Submission Number: 11626

Loading