Context-Conditioned Linear Layers for Efficient Transformers

ACL ARR 2026 January Submission3969 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Transformers, Parameter Efficiency, Dynamic Linear Layers, Context Conditioning, Language Modeling, Efficient Transformers
Abstract: Standard Transformers rely on scaling static dense layers for capacity, which leads to significant memory and computational redundancy. Existing efficiency methods like Mixture-of-Experts (MoE) or static low-rank factorization introduce trade-offs in routing complexity or limited expressivity. We introduce Context-Conditioned Linear Layers (CCL), a drop-in replacement for dense layers that replaces static matrices with a dynamic composition mechanism. CCL learns a compact Basis Dictionary and a lightweight Global Context Manager. For each token, the model dynamically modulates the basis to construct a unique, token-specific linear operator. This yields high expressed capacity, spanning a high-dimensional subspace across the sequence, while minimizing stored capacity. Empirical results demonstrate that CCL significantly improves perplexity–parameter trade-offs. A 10.3M parameter model achieves lower perplexity than a 70.5M dense baseline, representing a 6.8× reduction in size with superior performance, all while preserving hardware-friendly dense linear algebra.
Paper Type: Long
Research Area: LLM Efficiency
Research Area Keywords: LLM Efficiency, efficient models, parameter-efficient training, model architectures, sparse models, low-rank methods, language modeling, scaling
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Theory
Languages Studied: English
Submission Number: 3969
Loading