Finetuning Generative Models to Match Feature Distributions

Nathaniel Lee Diamant; Brian L. Trippe

Finetuning Generative Models to Match Feature Distributions

Nathaniel Lee Diamant, Brian L. Trippe

Published: 30 May 2026, Last Modified: 01 Jun 2026SPIGM @ ICML PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: generative models, AI for science, finetuning, MMD

Abstract: Generative models can produce individually plausible samples while failing to match a target distribution, a costly failure mode when distributional structure carries scientific signal. We study this problem through _feature distributions_: distributions of domain-specific features induced by model samples. For example, a protein generator may need to match the distribution of secondary-structure content observed in biology, while a molecular generator may need to match the property distribution of a small antibiotic dataset without memorizing the target molecules. Standard reward finetuning optimizes per-sample rewards and is therefore poorly aligned with this distribution-matching goal. We introduce kernel Calibrating Generative Models (kCGM), a method for finetuning autoregressive, diffusion, and other generative models using only samples from a target feature distribution. Across protein and small-molecule generation tasks, kCGM improves feature-distribution matching over baselines, including a 174-molecule antibiotic setting where it outperforms supervised finetuning while increasing chemical validity. These results show that feature distributions provide a practical training signal for adapting pretrained generative models.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 35

Loading