Track: short paper (up to 4 pages)
Keywords: Mamba, transformer, in-context-learning, task mixture, labeling
Abstract: In-context learning (ICL) refers to the ability to perform new tasks based on a prompt sequence consisting of ``in-context'' input-output pairs, without explicit model training. Previous work has shown that State-Space Models (SSMs), particularly Mamba, are potential competitors over Transformers in ICL. However, the capability to handle mixed tasks in complicated ICL prompts remains unanswered. In this work, we explore the Mamba performance in mixed ICL tasks, in a degree from low to high, and from labeled to unlabeled, compared to that of Transformers. We show that Mamba is capable of learning ICL mixtures, reaching the performance of single ICL task and Transformer baselines. Moreover, Mamba converges faster and shows more stable performances than Transformers, allowing Mamba to handle longer context lengths and more complicated prompt structures. Different learning dynamics in different ICL tasks are also observed.
Submission Number: 49
Loading