Keywords: Interpretable Machine Learning, Biological Sequence Analysis, Applications in Biology
Abstract: Deep learning models have significantly advanced genomics sequence modeling by uncovering complex patterns, but their black-box nature limits the biological insights that can be derived from their predictions. While recent efforts have employed post-hoc methods to identify important nucleotides or motifs, these approaches are decoupled from the prediction process and often suffer from accuracy limitations. In this paper, we introduce the Shapley Additive Self-Attribution (SASA) framework to genomics sequence modeling and propose BioSASANet. BioSASANet integrates a marginal contribution-based sequential module that captures long-range nucleotide interactions, and a positional Shapley value module that explicitly models the reverse complementarity (RC) property of genomic sequences, enabling position-aware, biologically grounded self-attribution. Experiments on two genomics tasks show that BioSASANet achieves accurate predictive performance while providing more faithful nucleotide-level attributions. Additionally, its flexible design allows integration with state-of-the-art DNA language model backbones, enabling advanced models with Shapley value-based self-interpretation.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 9091
Loading