Keywords: Single-cell Foundation Models, Cell Retrieval, Benchmarking
TL;DR: We propose a systematic benchmark to assess the cell retrieval capabilities of single-cell foundation models
Abstract: Efficiently and accurately searching large-scale single-cell RNA-seq databases has been a long standing computational challenge. There is an increasing number of single-cell retrieval methods, particularly those based on single-cell foundation models, proposed in the literature. However, this field lacks a comprehensive benchmark among these methods. This gap exists due to the lack of standard evaluation metrics and comprehensive benchmark datasets. Addressing these challenges, we propose a comprehensive evaluation benchmark to assess the capabilities of 12 existing single-cell retrieval methods from three classes: non-machine learning method, VAE-based methods and single-cell foundation model (scFM) based methods. We propose a series of label-dependent and label-free evaluation metrics to assess the performance of single-cell retrieval methods. Through benchmarking across diverse settings (cross-platform, cross-species and cross-omics), our notable findings include: top scFMs such as UCE, scFoundation and SCimilarity show substantial overall advantage compared with other methods; traditional non-machine learning method perform well in cell retrieval thus should not be neglected; common cells retrieved by top methods share distinct gene expression patterns; label-free metrics have consistent evaluation outcome compared with label-based methods thus can be employed in a broader scenario. Our rigorous and comprehensive evaluation identifies the challenges and limitations of current single-cell retrieval methods and serves as foundation for further development of single-cell retrieval methods.
Supplementary Material: zip
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10172
Loading