BoundRL: Efficient Token-level Structured Text Segmentation through Reinforced Boundary Generation

BoundRL: Efficient Token-level Structured Text Segmentation through Reinforced Boundary Generation

ACL ARR 2026 January Submission7945 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Structured Text Segmentation, Reinforcement Learning, Prompt

Abstract: Structured texts -- from technical reports to AI prompts -- increasingly require segmentation into semantically meaningful components. Such texts often contain elements beyond plain language, such as code snippets, which conventional sentence-level segmentation methods cannot handle effectively. To address this, we propose BoundRL, a novel approach that jointly performs efficient token-level text segmentation and label prediction for long structured texts. Instead of generating full texts for each segment, it generates only starting tokens and reconstructs the complete texts by locating these tokens within the original texts, thereby reducing inference costs by 90% and minimizing hallucination. To train the models for the boundary generation, BoundRL performs reinforcement learning with verifiable rewards (RLVR) that jointly optimizes document reconstruction fidelity and semantic alignment. It further mitigates entropy collapse by constructing intermediate candidates by perturbing segment boundaries and labels to create stepping stones toward higher-quality solutions. Experiments show that BoundRL enables small language models (1.7B parameters) to outperform few-shot prompting with much larger models as well as SFT and standard RLVR baselines on complex prompts used for LLM applications.

Paper Type: Long

Research Area: Language Models

Research Area Keywords: reinforcement learning, text segmentation

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 7945

Loading