Finetuning Generative Models to Match Feature Distributions
Keywords: generative models, AI for science, finetuning, MMD
Abstract: Generative models can produce individually plausible samples while failing to match a target distribution, a costly failure mode when distributional structure carries scientific signal. We study this problem through _feature distributions_: distributions of domain-specific features induced by model samples. For example, a protein generator may need to match the distribution of secondary-structure content observed in biology, while a molecular generator may need to match the property distribution of a small antibiotic dataset without memorizing the target molecules. Standard reward finetuning optimizes per-sample rewards and is therefore poorly aligned with this distribution-matching goal. We introduce kernel Calibrating Generative Models (kCGM), a method for finetuning autoregressive, diffusion, and other generative models using only samples from a target feature distribution.
Across protein and small-molecule generation tasks, kCGM improves feature-distribution matching over baselines, including a 174-molecule antibiotic setting where it outperforms supervised finetuning while increasing chemical validity.
These results show that feature distributions provide a practical training signal for adapting pretrained generative models.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 35
Loading