How well do generative protein models generate?

Han Spinner; Aaron W Kollasch; Debora Susan Marks

How well do generative protein models generate?

Han Spinner, Aaron W Kollasch, Debora Susan Marks

Published: 04 Mar 2024, Last Modified: 29 Apr 2024GEM PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Machine learning: computational method and/or computational results

Keywords: Protein Design, Generative Models, Evaluation, Benchmarks

TL;DR: How do you know if your sequences are any good!? We propose a comprehensive set of evaluation metrics for generated protein sequences.

Abstract: Protein design relies critically on the generation of plausible sequences. Yet, the efficacy of many common model architectures from simple interpretable models, like position-specific scoring matrix (PSSM) and direct couplings analysis (DCA), to newer and less interpretable models, like variational autoencoders (VAEs), autoregressive large language models (AR-LLMs) and flow matching (FM), for sequence sampling remains uncertain. While some models offer unique sequence generation methods, issues such as mode collapse, generation of nonsensical repeats, and protein truncations persist. Trusted methods like Gibbs sampling are often preferred for their reliability, but can be computationally expensive. This paper addresses the need to evaluate the performance and limitations of different generation methods from protein models, considering dependencies on multiple sequence alignment (MSA) depth and available sequence diversity. We propose rigorous evaluation methods and metrics to assess sequence generation, aiming to guide design decisions and inform the development of future model and sampling techniques for protein design applications.

Submission Number: 109

Loading