MSAMamba: Adapting Subquadratic Models To Long-Context DNA MSA Analysis

Vishrut Thoutam; Dina Ellsworth

MSAMamba: Adapting Subquadratic Models To Long-Context DNA MSA Analysis

Vishrut Thoutam, Dina Ellsworth

Published: 18 Jun 2024, Last Modified: 07 Jul 2024TF2M 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: State Space Models, Mamba, Transformer, Attention, Training Methods, Subquadratic Models, Multiple Sequence Alignments, Genomics

TL;DR: Applying Mamba's selective scan along the sequence dimension of DNA multiple sequence alignments shows state-of-the-art performance in long-context variant effect prediction and genomic benchmarks tasks with decreased computational complexity.

Abstract: We introduce MSAMamba, a novel architecture designed to address the context-length limitation of existing transformer-based models for DNA multiple sequence alignments (MSAs). Traditional transformers struggle with the vast context lengths inherent in MSA genome data, mainly due to the quadratic complexity of self-attention at large batch sizes. MSAMamba leverages a selective scan operation along the sequence dimension and separates sequence length and MSA dimension processing to enhance efficiency. This architecture enables scalable analysis of long DNA sequences, increasing the training context length of previous methods by 8x. In addition, we develop a row-sparse training method that significantly reduces the computational overhead of the selective scan. We demonstrate that MSAMamba achieves performance on par with state-of-the-art (SOTA) transformer-based models in variant effect prediction tasks and exceeds their performance at longer context lengths. We also demonstrate that MSAMamba excels in long-context GenomicBenchmarks tasks. Our results indicate that MSAMamba mitigates the computational challenges of long-context DNA MSA analysis and sets a new standard for scalability and efficiency in genomic modeling.

Submission Number: 64

Loading