S-Former: Structural Anchoring for Stable Long-Context Modeling

S-Former: Structural Anchoring for Stable Long-Context Modeling

ICLR 2026 Conference Submission17278 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Transformers; Extrapolation; Structural attention

Abstract: Transformers form the backbone of modern large language models, but their long-context performance is limited by the dilution effect: attention mass spreads uniformly across distant positions, failing to maintain structural dependencies. Existing solutions, such as sparse or efficient attention patterns, improve efficiency but do not address the lack of structural anchoring. We introduce the Structural-Former (S-Former), which maintains a parallel structural stream that evolves recurrently to track sequential patterns independently of token content and provides structural-like anchors for attention. Unlike compressed state-space models, our approach maintains explicit structural representations that remain orthogonal to semantic content. We study two integration mechanisms: (i) attention fusion, which validates the decoupling principle by showing that the structural gate $\alpha_t$ tracks bracket depth in Dyck languages; and (ii) bias injection, a minimal and stable design that adds the structural signal into hidden activations. Synthetic probes (Markov, Dyck and JSON) demonstrate that the structural stream learns hierarchical and sequential rules beyond surface statistics. On WikiText-103, S-Former extrapolates stably to long contexts, reducing perplexity degradation by 76% when extrapolating to 40k tokens. These findings suggest that introducing a recurrent structural stream provides a lightweight and scalable inductive bias that substantially improves long-context extrapolation, offering a complementary direction to sparse attention or memory-based methods.

Supplementary Material: pdf

Primary Area: foundation or frontier models, including LLMs

Submission Number: 17278

Loading