Exploring Query-to-reference Mapping Challenges for Automated Single-Cell Atlas-based Diagnostics

Published: 06 Mar 2025, Last Modified: 18 Apr 2025ICLR 2025 Workshop LMRLEveryoneRevisionsBibTeXCC BY 4.0
Track: Full Paper Track
Keywords: data integration, transcriptomics, diagnostics
TL;DR: Automated diagnostics using single-cell atlases shows great promise, but our benchmark of current integration tools reveals significant open challenges
Abstract: Single-cell atlases are built by integrating multiple heterogeneous datasets into a common embedding space. The aim is reducing the dataset-specific biases or batch effects, while capturing the overall cellular composition and biological variability. One of the envisioned applications is automated diagnostics, where atlases are used as references to predict the phenotype of unseen patients. Here, we developed a diagnostic tool from a multi-disease atlas of inflammation. Moreover, we provided a benchmark of state-of-the-art integration methods for mapping and classifying unseen patients. In our tests, all the methods performed well when query batch effects are well represented in the reference, but mostly failed otherwise. Notably, linear integration approaches demonstrated superior robustness and reduced hyperparameter sensitivity compared to more powerful variational autoencoder-based methods. These findings highlight two fundamental challenges: the selection of the optimal integration method and the management of previously unobserved batch effects when classifying new query patients. As a viable solution, we designed and tested a Centralized experimental scenario where reference and query datasets are generated in the same center, demonstrating a potential pathway toward reliable atlas-based diagnostics.
Attendance: Francesco Craighero
Submission Number: 45
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview