Context-Conditioned Linear Layers for Efficient Transformers

Context-Conditioned Linear Layers for Efficient Transformers

ACL ARR 2026 January Submission3969 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Transformers, Parameter Efficiency, Dynamic Linear Layers, Context Conditioning, Language Modeling, Efficient Transformers

Abstract: Standard Transformers rely on scaling static dense layers for capacity, which leads to significant memory and computational redundancy. Existing efficiency methods like Mixture-of-Experts (MoE) or static low-rank factorization introduce trade-offs in routing complexity or limited expressivity. We introduce Context-Conditioned Linear Layers (CCL), a drop-in replacement for dense layers that replaces static matrices with a dynamic composition mechanism. CCL learns a compact Basis Dictionary and a lightweight Global Context Manager. For each token, the model dynamically modulates the basis to construct a unique, token-specific linear operator. This yields high expressed capacity, spanning a high-dimensional subspace across the sequence, while minimizing stored capacity. Empirical results demonstrate that CCL significantly improves perplexity–parameter trade-offs. A 10.3M parameter model achieves lower perplexity than a 70.5M dense baseline, representing a 6.8× reduction in size with superior performance, all while preserving hardware-friendly dense linear algebra.

Paper Type: Long

Research Area: LLM Efficiency

Research Area Keywords: LLM Efficiency, efficient models, parameter-efficient training, model architectures, sparse models, low-rank methods, language modeling, scaling

Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Theory

Languages Studied: English

Submission Number: 3969

Loading