HGRN2: Gated Linear RNNs with State Expansion

Zhen Qin; Songlin Yang; Weixuan Sun; Xuyang Shen; Dong Li; Weigao Sun; Yiran Zhong

HGRN2: Gated Linear RNNs with State Expansion

Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong

Published: 10 Jul 2024, Last Modified: 26 Aug 2024COLMEveryoneRevisionsBibTeXCC BY 4.0

Research Area: Compute efficient LMs

Keywords: linear attention, long convolution, linear RNN, Efficient LM, Linear complexity

TL;DR: HGRN2 improves upon HGRN by introducing an outer product-based gating mechanism, enhancing memory capacity with better performance.

Abstract: Hierarchically gated linear RNN (HGRN) has demonstrated competitive training speed and performance in language modeling while offering efficient inference. However, the recurrent state size of HGRN remains relatively small, limiting its expressiveness. To address this issue, we introduce a simple outer product-based state expansion mechanism, which significantly enlarges the recurrent state size without introducing any additional parameters. This enhancement also provides a linear attention interpretation for HGRN2, enabling hardware-efficient training. Our extensive experiments verify the advantage of HGRN2 over HGRN consistently across different settings and comptetive to other recurrent models.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html

Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html

Submission Number: 649

Loading