Additional Submission Instructions: For the camera-ready version, please include the author names and affiliations, funding disclosures, and acknowledgements.
Track: Track 1: Original Research/Position/Education/Attention Track
Keywords: sequnce-to-function models, explainable AI, gene regulation, virtual experiments, genomics
TL;DR: Explainable AI–guided virtual experiments reveal that DNA sequence context substantially shapes predicted enhancer activity, demonstrating that context plays a key role in the cis-regulatory code of gene regulation..
Abstract: Deciphering the cis-regulatory code, the rules by which DNA sequence governs gene regulation, is a central challenge in biology with wide-ranging implications for understanding disease mechanisms and engineering DNA for synthetic biology and therapeutic applications. Deep learning models consistently achieve state-of-the-art performance in predicting regulatory activity from DNA sequence, but their black-box nature limits mechanistic insight. Post hoc interpretability tools have identified important sequence motifs corresponding to transcription factor (TF) binding sites, yet the quantitative contribution of surrounding sequence context remains poorly understood. Here, we treat a high-performing sequence-to-function model as a virtual experimental platform, pairing explainable AI with large-scale in silico motif-context swap experiments to quantify the relative contributions of TF motifs and surrounding sequence context to the model’s predicted enhancer activity. Using attribution maps, we identify and localize motif instances, then systematically transplant identical motif syntax between different sequence contexts and measure changes in predicted activity to estimate each component’s effect. Surprisingly, we find that sequence context plays an outsized role compared to motifs, sometimes accounting for most of the predicted activity. Context effects are most pronounced in housekeeping gene programs, where motifs modestly tune a baseline set by sequence context, whereas developmental programs show stronger motif-driven regulation. Our results motivate a paradigm shift from motif-centric models toward quantitative motif–context frameworks that treat sequence context as an active component of the cis-regulatory code rather than a passive scaffold.
Submission Number: 225
Loading