From Transformer to Transponder: Introducing Contextual Modulation Training for Residual Learning in LLMs
Keywords: deep learning, residual connection, modulation training, contextual scaling
Abstract: Transformers are the backbone of state-of-the-art systems across language, vision, and multimodal learning tasks, yet the relevance scale of their functional blocks (self-attention and feed-forward networks) is typically constant across inputs and depth. Motivated by neuro-glia and epigenetic mechanisms—where glial cells and epigenetic processes modulate when and how neurons or genes express their activity—we introduce the *contextual modulator*: a lightweight, input-aware, neuro-glia-inspired meta-learner that rescales the outputs of linear sublayers within a block at token- and channel-level granularity. The modulator is implemented via compact parametric functions and adds negligible parameter overhead. Building on this idea, we propose Transponder, which integrates contextual modulators throughout Transformer blocks to endow functional residual architectures with fine-grained, input-adaptive control. Transponder provides evident improvement over six other scaling or normalization methods across LLaMA backbones ranging from 60M to 1B parameters, yielding consistent perplexity reductions with $\sim 1%$ additional parameters. Analysis reveals depth-, module-, and token-specific scaling patterns, indicating that learned modulators act as input-adaptive regulators of residual information flow. Transponder provides a simple, general mechanism for hierarchical meta-learning the base components of the Transformer-based models with context-sensitive modulators, providing robust and significant performance improvements without substantial architectural changes.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 21111
Loading