Research Area: Compute efficient LMs
Keywords: linear attention, long convolution, linear RNN, Efficient LM, Linear complexity
TL;DR: HGRN2 improves upon HGRN by introducing an outer product-based gating mechanism, enhancing memory capacity with better performance.
Abstract: Hierarchically gated linear RNN (HGRN) has demonstrated competitive training speed and performance in language modeling while offering efficient inference. However, the recurrent state size of HGRN remains relatively small, limiting its expressiveness. To address this issue, we introduce a simple outer product-based state expansion mechanism, which significantly enlarges the recurrent state size without introducing any additional parameters. This enhancement also provides a linear attention interpretation for HGRN2, enabling hardware-efficient training. Our extensive experiments verify the advantage of HGRN2 over HGRN consistently across different settings and comptetive to other recurrent models.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Submission Number: 649
Loading