Abstract: Diffusion models have demonstrated remarkable performance in generative modeling, but generating samples with specific desiderata remains challenging. Existing solutions --- such as fine-tuning, best-of-n sampling, and gradient-based guidance --- are expensive, inefficient, or limited in applicability. In this work, we propose FK steering, a framework for inference-time steering diffusion models with reward functions. In this work, we introduce FK steering, which applies Feynman-Kac interacting particle systems to the inference-time steering of diffusion models with arbitrary reward functions. FK steering works by generating multiple trajectories, called particles, and resampling particles at intermediate steps based on scores computed using functions called potentials. Potentials are defined using rewards for intermediate states and are chosen such that a high score indicates the particle will yield a high-reward sample. We explore various choices of potentials, rewards, and samplers. Steering text-to-image models with a human preference reward, we find that FK steering outperforms fine-tuned models with just 2 particles. Moreover, FK steering a 0.8B parameter model outperforms a 2.6B model, achieving state-of-the-art performance on prompt fidelity. We also steer text diffusion models with rewards for text quality and rare attributes such as toxicity, and find that FK steering generates lower perplexity text and enables gradient-free control. Overall, inference-time scaling and steering of diffusion models, even training-free, provides significant quality and controllability benefits. Code available [here](https://github.com/zacharyhorvitz/FK-Diffusion-Steering).
Lay Summary: Recently, AI systems have become increasingly effective at generating text, images, videos, and even new potential treatments for disease. Diffusion models have been a highly successful approach for these settings. However, getting diffusion models to produce exactly what a particular user wants can be challenging. One way to do this is to retrain a model, but this is slow, requires training resources, and may not work in many cases. This paper introduces Feynman-Kac (FK) steering, an approach that guides the model while it's generating outputs, using a scoring system to keep only the most promising candidates. It doesn’t require any additional training and works by generating multiple candidates in parallel and favoring the ones more likely to meet the desired goal. The method performs surprisingly well. We show that FK steering improves models for image and text generation. Notably, small models using FK steering can outperform larger models using fewer resources. In sum, FK steering offers an efficient and flexible way to control AI-generated content.
Link To Code: https://github.com/zacharyhorvitz/FK-Diffusion-Steering
Primary Area: Probabilistic Methods->Monte Carlo and Sampling Methods
Keywords: Diffusion models, steering, fine-tuning, particle-based sampling, sequential Monte Carlo
Submission Number: 1198
Loading