Structured and interpretable patient embeddings from Single-Cell Foundation Models

Gonçalo Rei Pinto; Su Han Cho; Selman Yasin Özleyen; Eva Fast; Soroor Hediyeh-zadeh; Jie Quan; Fabian J Theis

Structured and interpretable patient embeddings from Single-Cell Foundation Models

Gonçalo Rei Pinto, Su Han Cho, Selman Yasin Özleyen, Eva Fast, Soroor Hediyeh-zadeh, Jie Quan, Fabian J Theis

Published: 02 Mar 2026, Last Modified: 10 Mar 2026Gen² 2026 PosterEveryoneRevisionsCC BY 4.0

Track: Full / long paper (5-8 pages)

Keywords: Foundation Models, Patient Representation Learning, Concept Bottleneck Models, Gaussian Mixture Variational Autoencoders

TL;DR: SCOPE (Structured Compositional Patient Embeddings) a model for learning interpretable patient representations from transcriptomic scFM models using a Concept Bottleneck Gaussian Mixture Variational Autoencoder (CB-GM-VAE)

Abstract: Recent advances have led to rapid proliferation of single-cell foundation models (scFM); however, methods for extracting biologically meaningful and interpretable knowledge from these large pre-trained models remain limited. We propose SCOPE (Structured Compositional Patient Embeddings) a model for learning interpretable patient representations from transcriptomic scFM models using a Concept Bottleneck Gaussian Mixture Variational Autoencoder (CB-GM-VAE). SCOPE models the distribution of cell types and a set of pre-defined concepts across patients from single-cell representations, resulting in patient representations that are both structured and interpretable. Using a single-cell RNAseq breast cancer atlas, we demonstrate that patient representations extracted from a continually pre-trained scFM by the CB-GM-VAE can outperform both the specialized patient representation learning baselines and simple pseudobulk approaches in various downstream prediction tasks. Moreover, the learned concept activities highlight biologically meaningful differences between primary and invasive tumors particularly involving \(\mathrm{CD4}^+ \text{ T cells}\), mast cells, and endothelial cells that are well supported by prior studies. Collectively, these findings demonstrate that SCOPE enables the extraction of human-interpretable, disease-relevant signatures from scFMs, bridging the gap between foundation models and mechanistic insight in translational genomics.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 73

Loading