Mechanistic evidence that motif-gated domain recognition drives contact prediction in protein language models

Published: 30 Sept 2025, Last Modified: 30 Sept 2025Mech Interp Workshop (NeurIPS 2025) PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Circuit analysis, Understanding high-level properties of models, Causal interventions
Other Keywords: Protein Language Models, Contact Prediction, Case Studies
TL;DR: We show evidence that protein LMs predict contacts via a motif-gated, domain-recognition circuit, using SAEs.
Abstract: Protein language models (pLMs) achieve state-of-the-art performance on protein structure and function prediction tasks, yet their internal computations remain opaque. Sparse autoencoders (SAEs) have been used on pLMs to recover sparse model features, called latents, whose activations correlate with known biological concepts. However, prior work has not established which latents are causally necessary for pLM performance on downstream tasks. Here, we adapt causal activation patching to the pLM setting and perform it in SAE latent space to extract the minimal circuit responsible for contact prediction accuracy in two case study proteins. Preserving only a tiny fraction of latent--token pairs (0.022\% and 0.015\%) is sufficient to retain contact prediction accuracy in a residue unmasking experiment. We observe a two-step computation in which early-layer motif detectors respond to short local sequence patterns, gating mid-to-late domain detectors which are selective for protein domains and families. Path-level ablations confirm the causal dependence of domain detector latents on upstream motif detector latents. To evaluate these components quantitatively, we introduce two diagnostics: a Motif Conservation Test and a hypothesis-driven Domain Selectivity Test. All candidate motif-detector latents pass the conservation test, and 18/23 candidate domain-detector latents achieve AUROC~$\ge$~0.95. To our knowledge, this is the first circuits-style causal analysis for pLMs, pinpointing the motifs, domains, and motif-domain interactions that drive contact prediction in two specific case studies. The framework introduced herein will enable future mechanistic dissection of protein language models.
Submission Number: 117
Loading