Chimera: State Space Models Beyond Sequences

Aakash Lahoti; Tanya Marwah; Ratish Puduppully; Albert Gu

Chimera: State Space Models Beyond Sequences

Aakash Lahoti, Tanya Marwah, Ratish Puduppully, Albert Gu

Published: 07 Oct 2025, Last Modified: 21 Oct 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Transformer-based deep learning methods have emerged as the standard approach to model diverse data such as sequences, images, and graphs. These methods rely on self-attention, which treats data as an unordered set of elements. This ignores the neighborhood structure or graph topology of the data and requires the use of inductive biases, such as position embeddings in sequences and images, and random walks in graphs, to incorporate topology. However, developing bespoke inductive biases for each task requires significant effort and can also introduce side-effects hindering generalization. In this work, we introduce Chimera, a unified model that directly incorporates the data topology in a principled way, obviating the need for domain-specific biases. Central to Chimera is the observation that state-space models---which naturally do not require position embeddings---can be generalized to capture any general graph topology. Our experiments demonstrate the versatility of our approach---Chimera achieves strong performance across the domains of language, vision, and graphs, outperforming BERT on GLUE by 0.7 points, ViT on ImageNet-1k by 2.6%, and all the baselines on the Long Range Graph Benchmark. Our results validate Chimera's principled methodological contributions and affirm the long-held belief that data topology is a powerful inductive bias across modalities. We further propose algorithmic optimizations to improve Chimera's efficiency while maintaining performance: 1) For the subclass of Directed Acyclic Graphs we show that Chimera can be implemented as a linear time recurrence. 2) For general graphs, we relax the method with a simple mathematical approximation, achieving Transformer's quadratic complexity without relying on domain-specific biases.

Certifications: J2C Certification

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: This is the camera ready version of the paper post acceptance.

Supplementary Material: zip

Assigned Action Editor: ~Hankook_Lee1

Submission Number: 4879

Loading