Long Document Reconstruction Unlocks Scalable Long-Context RLVR

ACL ARR 2026 March Submission705 Authors

15 Mar 2026 (modified: 07 Jun 2026)ACL ARR 2026 March SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Long context, RLVR, Unsupervised
Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has become a prominent paradigm to enhance the capabilities (i.e.\ long-context) of Large Language Models (LLMs). However, it often relies on gold-standard answers or explicit evaluation rubrics provided by powerful teacher models or human experts, which are costly and time-consuming. In this work, we investigate unsupervised approaches to enhance the long-context capabilities of LLMs, eliminating the need for heavy human annotations or teacher models' supervision. Specifically, we first replace a few paragraphs with special placeholders in a long document. LLMs are then trained through reinforcement learning to reconstruct the long document by correctly identifying and sequencing missing paragraphs from a set of candidate options. This training paradigm enables the model to capture global narrative coherence, significantly boosting long-context performance. We validate the effectiveness of our method on two widely used benchmarks, RULER and LongBench v2. While acquiring noticeable gains on RULER (nearly 10 points), it can also achieve a reasonable improvement on LongBench v2 without any manually curated long-context QA data. Furthermore, we conduct extensive ablation studies to analyze the impact of reward designs, data curation strategies, training schemes, and data scaling effects on model performance. We will release our code, data, and models.
Paper Type: Long
Research Area: Efficient Methods for NLP
Research Area Keywords: Efficient/Low-Resource Methods for NLP, Language modeling
Contribution Types: NLP engineering experiment, Data resources, Data analysis
Languages Studied: English
Submission Number: 705
Loading