Research on improving boundary-awareness in long-context language models using the BiMix-LM method based on TV-Regularized Dual-Spectral Routing.
Keywords: Long-context language modeling; Transformer alternatives; Token mixing; Dual-spectral routing; Frequency-band gating; DCT; Chebyshev polynomials; Total variation (TV) regularization; Convex optimization; Interpretability; Boundary-aware modeling; Gate distillation.
Abstract: Long-context language modeling exhibits heterogeneous positional structure: smooth global regularities (e.g., topic drift and stylistic rhythm) co-exist with sharp, boundary-localized transitions (e.g., paragraph/section breaks and code delimiters). Standard Transformers typically rely on a single positional scheme and a single token mixer, which makes it hard to separate global versus boundary-sensitive phenomena and limits interpretability. This paper introduces BiMix-LM, a dual-spectral gated token mixer designed to decouple these two behaviors for long sequences. BiMix-LM constructs two parallel spectral branches over token positions: a DCT branch targeting smooth, quasi-periodic structure and a Chebyshev branch emphasizing boundary-sensitive variation. To obtain interpretable routing across positional frequency bands, BiMix-LM employs band-wise gates optimized on the frequency axis with a TV-regularized, box-constrained convex objective, yielding piecewise-smooth gate maps; the optimized gates are then distilled into a lightweight gating network for end-to-end training and efficient inference.
Experiments on long-context benchmarks show that BiMix-LM improves the quality--efficiency trade-off under matched budgets, achieving consistent gains on multi-document QA, long-code modeling, and LRA-style tasks, while substantially increasing inference throughput.
Paper Type: Long
Research Area: Language Models
Research Area Keywords: Language Models, LLM Efficiency, Interpretability and Analysis of Models for NLP, Machine Learning for NLP, Question Answering, Natural Language Generation
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches low compute settings-efficiency
Languages Studied: English, Programming languages
Submission Number: 2054
Loading