Keywords: Language Modeling, Retentive Network
Abstract: The Retentive Network (RetNet) has recently
emerged as a formidable successor to the Trans-
former architecture. Although the self-attention
mechanism excels at capturing global depen-
dencies, its inherent quadratic complexity im-
poses significant memory constraints and in-
hibits scalability during long-sequence mod-
eling. To overcome these challenges, RetNet
introduces an innovative retention mechanism
that integrates the inductive bias of recurrent
neural networks with the parallelizable train-
ing advantages of attention-based models. This
unified representation allows RetNet to achieve
constant-time inference and linear-time training
without sacrificing representational capacity.
Despite the growing body of research demon-
strating the efficacy of RetNet across diverse
fields such as natural language processing, com-
puter vision, and time-series analysis, a system-
atic synthesis of the current literature is cur-
rently unavailable. This paper presents the first
comprehensive survey of Retentive Networks
through a detailed examination of its architec-
tural foundations, core innovations, and special-
ized variants. Furthermore, we provide a multi-
disciplinary analysis of its applications ranging
from basic sequence tasks to complex cross-
modal scenarios. Finally, we offer prospective
insights and suggest strategic avenues for fu-
ture inquiry to facilitate the continued evolution
of RetNet in both academic research and large-
scale industrial applications.
Paper Type: Long
Research Area: LLM Efficiency
Research Area Keywords: Language Modeling, Retentive Network
Contribution Types: Surveys
Languages Studied: English
Submission Number: 706
Loading