STATE-SPACE-LIKE MODELS TO CALL COPY NUMBERS

Published: 06 Mar 2025, Last Modified: 18 Apr 2025ICLR 2025 Workshop LMRLEveryoneRevisionsBibTeXCC BY 4.0
Track: Tiny Paper Track
Keywords: mamba, transformer alternatives, state space models, genomics, cancer
TL;DR: Transformer alternatives for zero-shot calling of copy number alterations using simulation-based training
Abstract:

Somatic copy number alterations (CNAs) are hallmarks of cancer. Current algorithms that call CNAs from whole genome sequenced (WGS) data have not exploited deep learning methods owing to computational scaling limitations. Here, we present a novel deep-learning approach, araCNA, trained only on simulated data that can accurately predict CNAs in real WGS cancer genomes. araCNA uses novel transformer alternatives (e.g Mamba) to handle genomic-scale sequence lengths ($\sim$1M) and learn long-range interactions. Results are extremely accurate on simulated data, and this zero-shot approach is on par with existing methods when applied to 50 WGS samples from the cancer genome atlas. Notably, our approach requires only a tumour sample and not a matched normal sample, has fewer markers of overfitting, and performs inference in only a few minutes.

Attendance: Ellen Visscher
Submission Number: 5
Loading