Keywords: scaling laws, scRNA-seq, data quality
Abstract: Learning meaningful representations of cellular states is a key problem in computational biology. Yet, the scaling behavior of single-cell representation learning models remains poorly understood. While recent work has proposed that model performance scales predictably with measurement noise, this hypothesis has only been validated with relatively small models and datasets. In this work-in-progress, we present the first empirical evidence supporting measurement noise scaling laws at large scales using datasets on the order of $10^7$ cells and transformer-based models with $>10^7$ parameters. We demonstrate that previously observed noise-scaling behavior again consistently emerge in these large-scale models and datasets. Our results provide further evidence that measurement noise is an important scaling axis for cellular representation learning.
Submission Number: 53
Loading