Understanding Energy-Based Modeling of Proteins via an Empirically Motivated Minimal Ground Truth Model

Published: 28 Jul 2023, Last Modified: 28 Jul 2023SynS & ML @ ICML2023EveryoneRevisionsBibTeX
Keywords: Generative Model, EBM, Proteins, Potts Model, DCA, Statistical Physics, Amino Acids
TL;DR: We develop a ground truth model of protein sequences based upon experimental evidence and use it to benchmark the performance of a generative model.
Abstract: Energy-based models (EBM) of sequences of evolutionarily related families of proteins have the ability to learn the generic constraints necessary to make novel functional sequences, which have been validated by \textit{in vivo} experiments. However, these learned energy functions require re-scaling by a temperature parameter in order to sample novel functional sequences. Here, we generate data from a minimal model motivated by a wide array of empirical evidence for a synergistic cluster of amino acids, or sector, within a sequence. We find our setting captures salient learning behaviors similar to those exhibited by EBMs fitted to real proteins, namely the necessity for temperature tuning to increase generative performance. We discuss how this guides insight into the functional sequence space of proteins.
Submission Number: 58
Loading