Intermediate Layers Encode Optimal Biological Representations in Single-Cell Foundation Models

Published: 04 Mar 2026, Last Modified: 08 Mar 2026ICLR 2026 Workshop LMRL PosterEveryoneRevisionsBibTeXCC BY 4.0
Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
Track: tiny / short paper (2-4 pages excluding references; extended abstract format)
Keywords: single-cell foundation models, embedding analysis, representation learning, computational biology
TL;DR: Final-layer embeddings are suboptimal for single-cell foundation models, intermediate layers encode better biological representations in a task- and context-dependent manner.
Abstract: Current single-cell foundation model benchmarks universally extract final layer embeddings, assuming these represent optimal feature spaces. We systematically evaluate layer-wise representations from scFoundation (100M parameters) and Tahoe-X1 (1.3B parameters) across trajectory inference and perturbation response prediction. Our analysis reveals that optimal layers are task-dependent (trajectory peaks at 60\% depth, 31\% above final layers) and context-dependent (perturbation optima shift 0–96\% across T cell activation states). Notably, first-layer embeddings outperform all deeper layers in quiescent cells, challenging assumptions about hierarchical feature abstraction. These findings demonstrate that ``where'' to extract features matters as much as ``what'' the model learns, necessitating systematic layer evaluation tailored to biological task and cellular context rather than defaulting to final-layer embeddings.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 34
Loading