Keywords: autoregressive language modeling, parallel decoding, self-refinement
Abstract: Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks. However, existing approaches typically implement iterative refinement at the application or prompting level, relying on autoregressive (AR) modeling. The sequential token generation in AR models can lead to high inference latency. To overcome these challenges, we propose **C**ontext-Wise **Or**der-**A**gnostic **L**anguage Modeling (COrAL), which incorporates iterative refinement directly into the LLM architecture while maintaining computational efficiency. Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally during the generation process. Leveraging the order-agnostic nature of COrAL, we introduce sliding blockwise order-agnostic decoding, which performs multi-token forward prediction and backward reconstruction within context windows. This allows the model to iteratively refine its outputs in parallel in the sliding block, effectively capturing diverse dependencies without the high inference cost of sequential generation. Our findings reveal a quality--speed trade-off, elucidating how COrAL effectively augments the self-enhancement capabilities of conventional autoregressive models without necessitating additional architectural components or extensive pre-training. This work underscores the promise of order-agnostic modeling in advancing LLMs for more efficient and effective natural language processing.
Submission Number: 102
Loading