A geometric foundation model for enzyme retrieval with evolutionary insights

Yong Liu, Chenqing Hua, Menglong Xu, Tao Zeng, Jiahua Rao, Zhongyue Zhang, Ruibo Wu, Jing-Ke Weng, Connor W. Coley, Shuangjia Zheng

Published: 12 Feb 2026, Last Modified: 16 Feb 2026Nature CatalysisEveryoneRevisionsCC BY-SA 4.0
Abstract: Enzyme catalysis drives chemical transformations essential for biological systems and diverse industrial applications. However, unravelling the complex relationships between enzymes and their catalytic reactions remains challenging. Here we introduce EnzymeCAGE, a catalytic-specific geometric foundation model trained on approximately 1.5 million structure-informed enzyme–reaction pairs spanning over 3,000 species. EnzymeCAGE integrates a geometry-aware multimodal architecture with evolutionary information to model the dependencies between enzyme structure, catalytic function and reaction specificity. We demonstrate that EnzymeCAGE accommodates both experimental and predicted enzyme structures and is applicable across a wide range of enzyme families and metabolites. Extensive evaluations reveal state-of-the-art performance in enzyme function prediction, reaction de-orphaning, catalytic site identification and biosynthetic pathway reconstruction, highlighting the potential of this approach to accelerate the discovery and engineering of advanced biocatalysts. Predicting the function of enzymes remains difficult and current computational methods require improvement. Now EnzymeCAGE, a geometric deep learning model, has been developed to more accurately predict the functions of uncharacterized enzymes and reconstruct biosynthetic pathways.
Loading