All Context Aware Reservoir Transformer

All Context Aware Reservoir Transformer

ACL ARR 2024 June Submission1003 Authors

13 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The commitment of language processing is largely restricted by knowing the context around it. However, Transformer, as one of the most powerful neural network architectures, restricts has its input length restricted due to a quadratic time and memory complexity. Despite rich work advancing its efficiency, long context is still an issue that requires large computational resources in training. We realize a novel reservoir Transformer that bounds the learning in linear time by handling different input lengths in a cascaded way. For a long-term context, the reservoir with non-linear readout learns sample dependencies from the beginning to the end of a sequential dataset; To learn more accurately the medium-term context such as previous sentences, we apply a recurrent memory mechanism; and finally for the short-term dependencies in one sentence, we learn with the Transformer. Experiments show that our reservoir Transformer improves BERT and Blenderbot performance and significantly increases our prediction accuracy in language modeling, text classification, and chatbot tasks over the state-of-the-art methods.

Paper Type: Long

Research Area: Machine Learning for NLP

Research Area Keywords: machine learning; neural networks; transformer; BERT

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Theory

Languages Studied: English

Submission Number: 1003

Loading