Chimera: State Space Models Beyond Sequences

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Deep Learning Architectures, Sequence Models, State Space Models, Mamba
TL;DR: Generalizing state space models to any data topology with state-of-the-art performance across diverse domains
Abstract: Powerful deep learning methods based on Transformers are used to model diverse data modalities such as sequences, images, and graphs. These methods typically use off-the-shelf modules like self-attention, which are domain-agnostic and treat data as an unordered set of elements. To improve performance, researchers employ inductive biases—such as position embeddings in sequences and images, and random walks in graphs—to inject the domain structure, or *topology*, into the model. However, these inductive biases are carefully engineered heuristics that must be designed for each modality, requiring significant research effort. In this work, we propose *Chimera*, a unified framework that mathematically generalizes state space models to incorporate the topological structure of data in a principled way. We demonstrate that our method achieves state-of-the-art performance across domains including language, vision, and graphs. Chimera outperforms BERT on the GLUE benchmark by 0.7 points, surpasses ViT by 2.6% on ImageNet-1k classification accuracy, and outperforms all baselines on the Long Range Graph Benchmark with a 12% improvement on PascalVOC. This validates Chimera's methodological improvement, which allows it to directly capture the underlying topology, providing a strong inductive bias across modalities. Furthermore, being topologically aware enables our method to achieve a linear time complexity for sequences and images, in contrast to the quadratic complexity of attention.
Supplementary Material: pdf
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10687
Loading