Keywords: Language modeling, Sampling
TL;DR: We introduce Resample-Previous-Tokens (RPT), a sampling method that allows models to revisit and replace previously generated tokens, leading to ~10% relative improvements in reasoning and coding after a short fine-tuning.
Abstract: Autoregressive language models accumulate errors due to their fixed, irrevocable left-to-right token generation. To address this, we propose a new sampling method called Resample-Previous-Tokens (RPT). RPT mitigates error accumulation by iteratively revisiting and potentially replacing tokens in a window of previously generated text. Fine-tuning a pretrained 8B parameter model with RPT for only 100B resulted in ~10% relative improvements on reasoning and coding benchmarks compared to the standard sampling.
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 13366
Loading