Mamba State-Space Models Are Lyapunov-Stable Learners

Published: 06 Mar 2025, Last Modified: 12 Apr 2025ICLR 2025 DeLTa Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 8 pages)
Keywords: Mamba, SSMs, LLMs, PEFT, mixed-precision, Lyapunov
TL;DR: Transformer LLMs diverge during mixed-precision PEFT, whereas Mamba LLMs trained using mixed-precision PEFT are guaranteed to remain close to full-precision full fine-tuning.
Abstract:

Compute-efficient methods–e.g., mixed-precision fine-tuning (MPFT) and parameter-efficient fine-tuning (PEFT)–have become standard tools for Transformer-based large language models (LLMs). While near-ubiquitously adapted, we empirically show that, under different combinations of MPFT and PEFT, Transformer LLMs may drastically diverge from their respective full-precision counterparts. In stark contrast, we show that recent Mamba LLMs based on state-space models (SSMs) are significantly more stable to changes introduced by combinations of MPFT and PEFT. This robustness is due to the recurrent dynamics of Mamba SSMs, which we prove are guaranteed to be stable using dynamical systems theory (in particular, Lyapunov exponents). Additionally, we demonstrate how targeting different Mamba parameters for low-rank adaptation provides regularization and impacts PEFT generalization. We conclude by using MPFT and PEFT to novelly study Mamba LLMs’ in-context learning (ICL) abilities on natural language tasks, thus supplementing other recent work.

Submission Number: 28
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview