On The Robustness of scRNA-seq Foundation Models for Plants Under Cross-Domain Experimental Shift

Published: 04 Mar 2026, Last Modified: 11 Mar 2026ICLR 2026 Workshop LMRL PosterEveryoneRevisionsBibTeXCC BY 4.0
Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
Track: tiny / short paper (2-4 pages excluding references; extended abstract format)
Keywords: scRNA-seq, plant scRNA-seq, Arabidopsis thaliana, foundation models, transcriptomic embeddings, cross-domain shift, cross-study generalization, distribution shift, evaluation protocols, random split, replicate-held-out split, cross-experiment transfer, stress classification, batch effects, gene identity preservation, per-gene embeddings, ensemble classifiers, raw counts baseline, benchmark design
TL;DR: We build an Arabidopsis single-cell foundation model and show random splits overestimate performance; gene-identity preserving embeddings generalize best across experiments.
Abstract: Foundation models for single-cell transcriptomics promise to learn generalizable representations of cellular states, yet their robustness to cross-study distribution shift remains underexplored in plant systems. We introduce scAraFM, an Arabidopsis-specific foundation model, and evaluate its utility for stress prediction across leaf and root scRNA-seq datasets under three increasingly challenging protocols: random splits from a single experiment, replicate-based splits from a single experiment, and cross-experiment transfer learning. For single experiment settings, we find that random splits can overestimate performance by 20--30 AUROC points compared to replicate-held-out evaluation and across independent experiments, underscoring the need for study-aware validation in fragmented transcriptomic landscapes. Across representation strategies, gene-identity-preserving features consistently outperform pooled summaries, even when the latter are derived from pretrained transformers. Notably, simple baselines using raw reads remain competitive or superior to learned embeddings under single-experiment scenarios, challenging claims of universal advantage for foundation-model-derived features. Our promising results on cross-experiment transfer learning emphasize that evaluation design is as critical as model architecture, and that preserving per-gene structure aids generalization in downstream tasks.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 73
Loading