Towards Robust Evaluation of Protein Generative Models: A Systematic Analysis of Metrics

Pavel Strashnov; Andrey Shevtsov; Viacheslav Meshchaninov; Maria Ivanova; Fedor Nikolaev; Olga Kardymon; Dmitry Vetrov

Towards Robust Evaluation of Protein Generative Models: A Systematic Analysis of Metrics

Pavel Strashnov, Andrey Shevtsov, Viacheslav Meshchaninov, Maria Ivanova, Fedor Nikolaev, Olga Kardymon, Dmitry Vetrov

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: evaluation metrics, protein, protein generative models

TL;DR: Systematic analysis of protein generative model evaluation metrics, revealing key insights for improved assessment practices.

Abstract: The rapid advancement of protein generative models necessitates robust and principled methods for their evaluation and comparison. As new models of increasing complexity continue to emerge, it is crucial to ensure that the metrics used for assessment are well-understood and reliable. In this work, we conduct a systematic investigation of commonly used metrics for evaluating sequence protein generative models, focusing on quality, diversity, and distributional similarity. We examine the behavior of these metrics under various conditions, including synthetic perturbations and real-world generative models. Our analysis explores different design choices, parameters, and underlying representation models, revealing how these factors influence metric performance. We identify several challenges in applying these metrics, such as sample size dependencies, sensitivity to data distribution shifts, and computational efficiency trade-offs. By testing metrics on both synthetic datasets with controlled properties and outputs from state-of-the-art protein generators, we provide insights into each metric's strengths, limitations, and practical applicability. Based on our findings, we offer a set of practical recommendations for researchers to consider when evaluating protein generative models, aiming to contribute to the development of more robust and meaningful evaluation practices in the field of protein design.

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13975

Loading