Universal In-Context Approximation By Prompting Fully Recurrent Models

Aleksandar Petrov; Tom A. Lamb; Alasdair Paren; Philip Torr; Adel Bibi

Universal In-Context Approximation By Prompting Fully Recurrent Models

Aleksandar Petrov, Tom A. Lamb, Alasdair Paren, Philip Torr, Adel Bibi

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0

Keywords: prompting, universal approximation, in-context learning, recurrent models, rnn, ssm

TL;DR: We show that fully recurrent models, e.g., RNNs, LSTMs, GRUs and SSMs, can be universal in-context approximators

Abstract:

Zero-shot and in-context learning enable solving tasks without model fine-tuning, making them essential for developing generative model solutions. Therefore, it is crucial to understand whether a pretrained model can be prompted to approximate any function, i.e., whether it is a universal in-context approximator. While it was recently shown that transformer models do possess this property, these results rely on their attention mechanism. Hence, these findings do not apply to fully recurrent architectures like RNNs, LSTMs, and the increasingly popular SSMs. We demonstrate that RNNs, LSTMs, GRUs, Linear RNNs, and linear gated architectures such as Mamba and Hawk/Griffin can also serve be universal in-context approximators. To streamline our argument, we introduce a programming language called LSRL that compiles to these fully recurrent architectures. LSRL may be of independent interest for further studies of fully recurrent models, such as constructing interpretability benchmarks. We also study the role of multiplicative gating and observe that architectures incorporating such gating (e.g., LSTMs, GRUs, Hawk/Griffin) can implement certain operations more stably, making them more viable candidates for practical in-context universal approximation.

Supplementary Material: zip

Primary Area: Generative models

Submission Number: 4737

Loading