Towards Interpretable Protein Structure Prediction with Sparse Autoencoders

Nithin Parsan; David J Yang; John Jingxuan Yang

Towards Interpretable Protein Structure Prediction with Sparse Autoencoders

Nithin Parsan, David J Yang, John Jingxuan Yang

Published: 06 Mar 2025, Last Modified: 26 Apr 2025GEMEveryoneRevisionsBibTeXCC BY 4.0

Track: Machine learning: computational method and/or computational results

Nature Biotechnology: Yes

Keywords: Mechanistic Interpretability; Protein Structure Prediction; Sparse Autoencoders; Deep Learning; ESMFold; Protein Language Models

TL;DR: We introduce hierarchical sparse autoencoders trained on ESM2-3B to interpret protein structure prediction models, enabling targeted interventions on ESMFold and revealing how sequence features influence structural predictions.

Abstract: Protein language models have revolutionized structure prediction, but their nonlinear nature obscures how sequence representations inform structure prediction. While sparse autoencoders (SAEs) offer a path to interpretability here by learning linear representations in high-dimensional space, their application has been limited to smaller protein language models unable to perform structure prediction. In this work, we make two key advances: (1) we scale SAEs to ESM2-3B, the base model for ESMFold, enabling mechanistic interpretability of protein structure prediction for the first time, and (2) we adapt Matryoshka SAEs for protein language models, which learn hierarchically organized features by forcing nested groups of latents to reconstruct inputs independently. We demonstrate that our Matryoshka SAEs achieve comparable or better performance than standard architectures. Through comprehensive evaluations, we show that SAEs trained on ESM2-3B significantly outperform those trained on smaller models for both biological concept discovery and contact map prediction. Finally, we present an initial case study demonstrating how our approach enables targeted steering of ESMFold predictions, increasing structure solvent accessibility while fixing the input sequence. To facilitate further investigation by the broader community, we open-source our code, dataset, pretrained models, and visualizer.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Presenter: ~John_Jingxuan_Yang1

Format: Yes, the presenting author will definitely attend in person because they attending ICLR for other complementary reasons.

Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.

Submission Number: 70

Loading