Protein Fitness Landscape: Spectral Graph Theory Perspective
TL;DR: Spectral Graph Theory is a good way to describe protein fitness landscape and help us to design machine learning methods for protein engineering.
Abstract: In this work, we present a novel theoretical framework for analyzing and modeling protein fitness landscapes using spectral graph theory. By representing the protein sequence space as a generalized Hamming graph and studying its spectral properties, we derive a set of powerful tools for quantifying the ruggedness, epistasis, and other key characteristics of the landscape. We prove strong approximation and sampling results, showing that the landscape can be efficiently learned and optimized from limited and noisy data. Building on this foundation, we introduce Propagational Convolutional Neural Networks (PCNNs), a new class of inductive surrogate oracle. We provide rigorous theoretical guarantees on the generalization and convergence properties of PCNNs, using techniques from the Neural Tangent Kernel framework. Extensive experiments on real-world protein engineering tasks demonstrate the superiority of PCNNs over state-of-the-art methods, achieving higher fitness and better generalization from limited data.
Submission Number: 974
Loading