Keywords: Deep Learning, attention, motifs, genomics, motif interactions, interpretability
Abstract: A major goal of computational genomics is to understand how sequence patterns, called motifs, interact to regulate gene expression. In principle, convolution-attention networks (CANs) should provide an inductive bias to infer motif interactions; convolutions can capture motifs while self-attention learns their interactions. However, it is unclear the extent to which this is true in practice. Here we perform an empirical study on synthetic data to test the efficacy of uncovering motif interactions in CANs. We find that irrespective of design choice, interpreting local attention (i.e. on an individual sequence basis) is noisy, leading to many false positive motif interactions. To address this issue, we propose Global Interactions via Filter Activity Correlations (GLIFAC). GLIFAC robustly uncovers motif interactions across a wide spectrum of model choices. This work provides guidance on design choices for CANs that lead to better interpretability for regulatory genomics without sacrificing generalization performance.
Track: Original Research Track