On the Representation Gap Between Modern RNNs and Transformers: The Curse of Memory Efficiency and the Fix of In-Context Retrieval

Kaiyue Wen; Xingyu Dang; Kaifeng Lyu

On the Representation Gap Between Modern RNNs and Transformers: The Curse of Memory Efficiency and the Fix of In-Context Retrieval

Kaiyue Wen, Xingyu Dang, Kaifeng Lyu

Published: 04 Mar 2024, Last Modified: 02 Apr 2024ME-FoMo 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: RNN, Chain-of-thought, Retrieval, Transformer

TL;DR: This study demonstrates a notable gap between RNNs and Transformers in the IsTree task using chain-of-thought reasoning, and successfully bridges this gap by augmenting RNNs with in-context retrieval.

Abstract: This paper investigates the limitations of Recurrent Neural Networks (RNNs) in algorithmic tasks, particularly in comparison with Transformers. Focusing on a reasoning task IsTree deciding whether a graph is a tree, we demonstrate that RNNs with $o(n)$ parameters, even with Chain-of-Thought (CoT), cannot solve this task for graphs with size $n$, unlike Transformers which can solve the task with CoT and only $O(\log n)$ bit parameters. Our experiments confirm this representation gap. To overcome this limitation, we propose augmenting RNNs with in-context retrieval capabilities, specifically using regular expressions. This enhancement enables RNNs to solve IsTree and other algorithmic problems in $\mathsf{P}$, maintaining their memory efficiency and closing the gap with Transformers.

Submission Number: 103

Loading