A pretrained SCVI model for 60,000 drug perturbation experiments in 100 million cells

Published: 06 Mar 2025, Last Modified: 18 Apr 2025ICLR 2025 Workshop LMRLEveryoneRevisionsBibTeXCC BY 4.0
Track: Tiny Paper Track
Keywords: Representation learning, SCVI, Drug screen, Perturbation, Cancer cell lines
TL;DR: We present an SCVI model pretrained on a dataset with 60,000 drug perturbation experiments.
Abstract: We present a pre-trained SCVI model for the Tahoe-100M single-cell dataset, enabling large-scale single-cell analyses on systems with limited GPU memory. By compressing expression profiles from over 95 million cells into a 42 GB “minified” file plus a 1 GB model, this approach preserves essential biological signals while remaining practical for routine exploratory tasks. The openly available model supports downstream analyses such as differential expression and method development, without requiring access to the entire raw dataset.
Attendance: Valentine Svensson
Submission Number: 39
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview