LaneRoPE: Positional Encoding for Collaborative Parallel Reasoning and Generation

Published: 02 Mar 2026, Last Modified: 02 Mar 2026MALGAIEveryoneRevisionsBibTeXCC BY 4.0
Keywords: parallel reasoning, rope, positional encoding
Abstract: Parallel LLM test-time scaling techniques (e.g., best-of-$N$) require drawing $N>1$ sequences conditioned on the same input prompt. Such techniques demonstrate improved accuracy and since the $N$ generations are batched, they better utilize compute cores. However, traditionally, each sequence in the batch is generated independently and hence do not reuse compute, intermediate generations, or observations between sequences. In this paper, we propose LaneRoPE to enable coordination and collaboration between $N>1$ sequences at generation time. LaneRoPE involves two key ideas: (a) an inter-sequence attention mask to make sampling of sequences dependent on one another; and (b) a RoPE extension to inject positional information that captures relative positions between tokens, both within and outside a particular sequence. We evaluate our approach on mathematical reasoning tasks and find promising results: LaneRoPE enables collaboration among sequences, yielding additional accuracy gains under limited generated sequence length. Importantly, since LaneRoPE enables coordination with minimal changes to the underlying LLM architecture and introducing negligible new learnt parameters, it is appealing to rapidly introduce collaborative reasoning in existing LLM inference pipelines.
Submission Number: 89
Loading