Keywords: sampling, energy based models, discrete sampling, synergistic interactions, markov chain monte carlo
TL;DR: Use a predictor's training checkpoints as an annealing schedule to improve sampling
Abstract: Sampling from trained predictors is fundamental for interpretability and as a compute-light alternative to diffusion models, but local samplers struggle on the rugged, high-frequency functions such models learn. We observe that standard neural‑network training implicitly produces a coarse‑to‑fine sequence of models. Early checkpoints suppress high‑degree/ high‑frequency components (Boolean monomials; spherical harmonics under NTK), while later checkpoints restore detail. We exploit this by running a simple annealed sampler across the training trajectory, using early checkpoints for high‑mobility proposals and later ones for refinement. In the Boolean domain, this can turn the exponential bottleneck arising from rugged landscapes or needle gadgets into a near-linear one. In the continuous domain, under the NTK regime, this corresponds to smoothing under the NTK kernel. Requiring no additional compute, our method shows strong empirical gains across a variety of synthetic and real-world tasks, including constrained sampling tasks that diffusion models are unable to handle.
Supplementary Material: zip
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 21666
Loading