Interpretable Transformers by Condition Guided Self-Attention

Interpretable Transformers by Condition Guided Self-Attention

ICLR 2026 Conference Submission14666 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Deep Learning, Transformers, Biological Interpretability, Single Cell Data Classification, Alzheimer Disease

Abstract: Transformer models have achieved remarkable success across various domains, sparking breakthroughs in fields beyond their original applications. As a use case from neuroscience, distinguishing Alzheimer's disease (AD) from healthy brain cells using high-dimensional single-cell transcriptomic data is a challenging classification task. While attention-based models offer strong discriminative performance, accuracy alone is not sufficient. It is essential to balance predictive accuracy with biologically meaningful interpretations aligned with domain insights. In this work, we propose an interpretable Transformer architecture based on a Disease-Specific Conditional Guided Self-Attention (DSCGA) mechanism for Alzheimer’s disease cell classification. The proposed approach maps each cell gene expression vector into a set of tokens corresponding to a catalogue of known biological pathways. It first learns pathway-based representations to classify cells, followed by biologically guided refinement using our proposed DSCGA mechanism, which amplifies attention to pathways relevant to a specific condition while down-weighting irrelevant ones, adapting to the context of each cell observation Specifically, it extends the standard self-attention mechanism by progressively incorporating a second term that is activated dynamically to help the model focus on condition-related biological pathways. The final attention scores are computed by adding the original self-attention scores and the condition-specific scores. The model requires post-training with DSCGA to condition attention on disease-specific signals. The ultimate goal is to distinguish between AD and healthy cells while generating interpretation-driven predictions aligned with prior biological knowledge. Extensive experiments were carried out using real-world datasets, namely Seattle and ROSMAP. Experimental results demonstrate the effectiveness of our proposal and prove its ability to outperform baselines in terms of biological interpretation quality while maintaining a controlled accuracy drop. Precisely, for AD-predicted cells, our method increases the number of correctly identified AD-related pathways using attention scores, from 3.96 to 18.98 (KEGG) and 8.29 to 30.53 (WikiPathways) on Seattle, and from 3.88 to 18.89 (KEGG) and 7.65 to 30.90 (WikiPathways) on ROSMAP. Furthermore, our proposal can be adapted to improve the domain specific interpretability of several existing attention-based architectures if external established knowledge is available.

Primary Area: interpretability and explainable AI

Submission Number: 14666

Loading