Evaluating Multi-Modal Models for Enzyme-Reaction Retrieval

Evaluating Multi-Modal Models for Enzyme-Reaction Retrieval

ICML 2025 Workshop FM4LS Submission39 Authors

Published: 12 Jul 2025, Last Modified: 12 Jul 2025FM4LS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: enzyme, protein, enzyme retrieval, catalytic sites, proteins structure, protein language models, multi-modal learning, contrastive learning, binary classification

Abstract: Identifying functional enzymes that can perform unannotated reactions is a major biotechnological bottleneck. While multi-modal machine learning models can be used to retrieve enzymes given target functions (reactions), existing methods have not been adequately compared to each other. Two key areas warrant further investigation: first, the optimal way to incorporate 3D protein structure and predicted binding pockets for enzyme retrieval, and second, the most effective learning objectives for training such multi-modal models. We examine these questions through experiments on Task 2 of Classification and Retrieval for Enzymes (CARE) benchmark, demonstrating that multi-modal representations combining protein structure with pocket information have better performance than sequence-only methods. Second, we evaluate learning objectives and found that contrastive learning generally provides superior performance for enzyme retrieval compared to a binary classification. Our work underscores the value of integrating structural and pocket information for precise enzyme-reaction matching and offers insights into effective training objectives for such retrieval models.

Submission Number: 39

Loading