What Does a Chromatin Foundation Model Know About a Petri Dish? Sparse Autoencoders Reveal In Vitro vs. In Vivo Context in EPIBERT

Published: 28 May 2026, Last Modified: 28 May 2026GenBio 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: sparse autoencoders, gene expression models, chromatin, mechanistic interpretability
Abstract: Foundation models and AI agents that act on biological data are only as trustworthy as our understanding of what those models internally represent. Epigenomics foundation models are trained predominantly on cell-line data yet routinely applied to in vivo biology, with direct consequences for any downstream agentic system that consumes their predictions. We ask whether EPIBERT, a transformer pre-trained on ATAC-seq chromatin accessibility data, internally encodes an in vitro vs. in vivo contrast across six matched biosample conditions spanning blood, liver, and lymph lineages. We train layer-wise Sparse Autoencoders (SAEs) with BatchTopK activations, introduce the Context Divergence Score (CDS) to identify context-specific features, and validate them through causal ablation, linear context-steering, and three-level biological annotation (ChromHMM, HOMER, GO:BP). Context-specific features grow 3.8-fold from early to late layer (57 → 215 Bonferroni-significant); causal ablation yields a large effect ; context-steering closes 11.2% of the prediction gap at 4.5× above random; and biological annotation confirms tissue-specific features are enriched for lineage- defining transcription factors (HNF4A/FOXA2 in liver, SPI1/RUNX1 in blood, EBF1/PAX5 in lymph), active regulatory elements, and tissue-specific processes. Together, these results provide a mechanistic and biological audit of an epigenomics foundation model and a concrete intervention path for tissue-aware deployment. Code is available at https://anonymous.4open.science/r/in_vivo_vs_in_vitro_chromatin_contexts/.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 41
Loading