StateX: Enhancing RNN Recall via Post-training State Expansion

StateX: Enhancing RNN Recall via Post-training State Expansion

ACL ARR 2025 July Submission540 Authors

28 Jul 2025 (modified: 19 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: While Transformer-based models have demonstrated remarkable language modeling performance, their high complexities result in high costs when processing long contexts. In contrast, recurrent neural networks (RNNs) such as linear attention and state space models have gained popularity due to their constant per-token complexities. However, these recurrent models struggle with tasks that require accurate recall of contextual information from long contexts, because all contextual information is compressed into a constant-size recurrent state. Previous works have shown that the recall ability is positively correlated with the recurrent state size, yet directly training RNNs with larger recurrent states results in great training costs. In this paper, we introduce StateX, a training pipeline for efficiently expanding the states of pre-trained RNNs through post-training. For two popular classes of RNNs, linear attention and state space models, we design post-training architectural modifications to scale up the state size with no or negligible increase in model parameters. Experiments on models up to 1.3B parameters demonstrate that StateX efficiently enhances the recall ability of RNNs without incurring high post-training costs and compromising other capabilities.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: continual learning, fine-tuning

Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency

Languages Studied: English

Reassignment Request Area Chair: This is not a resubmission

Reassignment Request Reviewers: This is not a resubmission

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: No

A2 Elaboration: The trained model is intended solely for academic research and will not be deployed in any real-world application. As such, a detailed risk analysis was not deemed necessary.

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: Section 1: Introduction

B2 Discuss The License For Artifacts: No

B2 Elaboration: All resources are public and free to use for scientific research.

B3 Artifact Use Consistent With Intended Use: No

B3 Elaboration: Our use of existing artifacts strictly follows common research practices and remains consistent with their intended use as originally specified.

B4 Data Contains Personally Identifying Info Or Offensive Content: No

B4 Elaboration: Our training data is publicly available and widely used, but we did not take additional steps to remove possible sensitive content.

B5 Documentation Of Artifacts: No

B5 Elaboration: We did not provide detailed documentation of the artifacts used or created. However, the data were used within clearly defined research boundaries, and no artifacts were shared externally. If shared in the future, appropriate documentation will be added, including details on domains, languages, and demographic coverage.

B6 Statistics For Data: Yes

B6 Elaboration: Section 5 and Appendix B.1

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: Section 5

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: Section 5

C3 Descriptive Statistics: Yes

C3 Elaboration: Section 5, Appendix B.2

C4 Parameters For Packages: N/A

D Human Subjects Including Annotators: No

D1 Instructions Given To Participants: N/A

D2 Recruitment And Payment: N/A

D3 Data Consent: N/A

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: N/A

E Ai Assistants In Research Or Writing: Yes

E1 Information About Use Of Ai Assistants: Yes

E1 Elaboration: We used AI for code autocompletion and checking for typos/grammatical errors during paper writing.

Author Submission Checklist: yes

Submission Number: 540

Loading