eccDNAMamba: A Pre-Trained Model for Ultra-Long eccDNA Sequence Analysis

Published: 11 Jun 2025, Last Modified: 18 Jul 2025GenBio 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Extrachromosomal circular DNA, State-Space Model, Ultra-Long Sequence, Pretrained Model
TL;DR: We implement eccDNAMamba, a bidirectional state-space model for ultra-long(up to 200k bp) eccDNA sequences, enabling efficient and scalable representation learning with strong performance on real-world datasets.
Abstract: Extrachromosomal circular DNA (eccDNA) plays key regulatory roles and contributes to oncogene overexpression in cancer through high-copy amplification and long-range interactions. Despite advances in modeling, no pre-trained models currently support full-length circular eccDNA for downstream analysis. Existing genomic models are either limited to single-nucleotide resolution or hindered by the inefficiency of the quadratic attention mechanism. Here, we introduce eccDNAMamba, the first bidirectional state-space encoder tailored for circular DNA sequences. It combines forward and reverse passes for full-context representation learning with linear-time complexity, and preserves circular structure through a novel augmentation strategy. Tested on two real-world datasets, eccDNAMamba achieves strong classification performance and scales to sequences up to 200 Kbp, offering a robust and efficient framework for modeling circular genomes. Our codes are available at https://github.com/zzq1zh/GenAI-Lab.
Submission Number: 104
Loading