Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Bo Peng; Daniel Goldstein; Quentin Gregory Anthony; Alon Albalak; Eric Alcaide; Stella Biderman; Eugene Cheah; Teddy Ferdinan; Kranthi Kiran GV; Haowen Hou; Satyapriya Krishna; Ronald McClelland Jr.; Niklas Muennighoff; Fares Obeid; Atsushi Saito; Guangyu Song; Haoqin Tu; Ruichong Zhang; Bingchen Zhao; Qihang Zhao; Jian Zhu; Rui-Jie Zhu

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Published: 10 Jul 2024, Last Modified: 26 Aug 2024COLMEveryoneRevisionsBibTeXCC BY 4.0

Research Area: Data, Compute efficient LMs, Engineering for large LMs

Keywords: large language model, scaling laws, open source, pretraining, RNN

TL;DR: We improve upon the design of RWKV models, an RNN-based language model with computational benefits compared to transformers

Abstract: We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality. We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks.

Supplementary Material: zip

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html

Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html

Submission Number: 422

Loading