Improving Bilinear RNN with Closed-loop Control

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 spotlightEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Sequence Modeling
Abstract: Recent efficient sequence modeling methods, such as Gated DeltaNet, TTT, and RWKV-7, have achieved performance improvements by supervising the recurrent memory management through the Delta learning rule. Unlike previous state-space models (e.g., Mamba) and gated linear attentions (e.g., GLA), these models introduce interactions between the recurrent state and the key vector, resulting in a bilinear recursive structure. In this paper, we first introduce the concept of Bilinear RNNs with a comprehensive analysis on the advantages and limitations of these models. Then based on the closed-loop control theory, we propose a novel Bilinear RNN variant named Comba, which adopts a scalar-plus-low-rank state transition, with both state feedback and output feedback corrections. We also implement a hardware-efficient chunk-wise parallel kernel in Triton and train models with 340M/1.3B parameters on a large-scale corpus. Comba demonstrates its superior performance and computation efficiency on both language modeling and vision tasks.
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 16847
Loading