Most successful applications of deep learning involve similar training and test conditions. However, for some generative tasks, samples should improve desirable properties beyond previously known values, which requires the ability to generate novel hypotheses that extrapolate beyond training data. While large language models have been successfully extended to a variety of sequence modeling problems, greedy autoregressive sampling can struggle to explore the solution space sufficiently to extrapolate, especially when the properties of interest are global to the sequence. On the other hand, sequence-level sampling methods such as Markov chain Monte Carlo (MCMC) offer theoretical guarantees about capturing the distribution of interest, but suffer from the curse of dimensionality in discrete structured spaces. We propose a new approach that bridges the gap between MCMC and autoregressive sampling, which may be viewed as off-policy reinforcement learning. Our approach uses selected states from Markov chains as a source of training data for an autoregressive inference network, which is then able to generate novel sequences at test time that extrapolate along the sequence-level properties of interest. The proposed approach is validated on three problems: protein sequence design, text sentiment control, and text anonymization. We find that the learned inference network confers many of the same (and sometimes better) generalization benefits compared to the slow sampling process, but with the additional benefit of high sample efficiency.
Keywords: Large language model, Markov chain Monte Carlo
TL;DR: We can distill extrapolative sequence transformations from Markov chains.
Abstract:
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11835
Loading