Protein Fitness Landscape: Spectral Graph Theory Perspective

Hao Zhu; Daniel M. Steinberg; Piotr Koniusz

Protein Fitness Landscape: Spectral Graph Theory Perspective

Hao Zhu, Daniel M. Steinberg, Piotr Koniusz

Published: 22 Jan 2025, Last Modified: 03 Oct 2025AISTATS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Spectral Graph Theory is a good way to describe protein fitness landscape and help us to design machine learning methods for protein engineering.

Abstract: In this work, we present a novel theoretical framework for analyzing and modeling protein fitness landscapes using spectral graph theory. By representing the protein sequence space as a generalized Hamming graph and studying its spectral properties, we derive a set of powerful tools for quantifying the ruggedness, epistasis, and other key characteristics of the landscape. We prove strong approximation and sampling results, showing that the landscape can be efficiently learned and optimized from limited and noisy data. Building on this foundation, we introduce Propagational Convolutional Neural Networks (PCNNs), a new class of inductive surrogate oracle. We provide rigorous theoretical guarantees on the generalization and convergence properties of PCNNs, using techniques from the Neural Tangent Kernel framework. Extensive experiments on real-world protein engineering tasks demonstrate the superiority of PCNNs over state-of-the-art methods, achieving higher fitness and better generalization from limited data.

Full Paper: https://proceedings.mlr.press/v258/zhu25c.html

Submission Number: 974

Loading