GFETM: Genome Foundation-based Embedded Topic Model for scATAC-seq Modeling

Published: 28 May 2026, Last Modified: 28 May 2026ICML 2026 FM4LS Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Foundation Model, scATAC-seq, Multi-scale modeling
TL;DR: GFETM is an interpretable scATAC-seq model that integrates genome foundation model with an embedded topic model to improve cell representation learning, transferability, and transcription factor activity interpretation.
Abstract: Single-cell Assay for Transposase-Accessible Chromatin with sequencing (scATAC-seq) enables investigation of open chromatin landscapes at single-cell resolution, but its analysis remains challenging because of sparsity, noise, and dataset-specific peak vocabularies. Genome Foundation Models (GFMs), pre-trained on large DNA sequence corpora, offer a potential source of transferable sequence information for scATAC-seq modeling. We introduce the Genome Foundation Embedded Topic Model (\model{}), an interpretable framework that combines GFMs with the Embedded Topic Model (ETM) for sequence-informed scATAC-seq analysis. By integrating GFM-derived DNA sequence embeddings into a topic-model decoder, \model{} improves clustering quality on standard benchmarks and captures cell-state-specific transcription factor activity through motif scoring and attention-based interpretation.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 13
Loading