ReactEmbed: A Plug-and-Play Module for Unifying Protein-Molecule Representations Guided by Biochemical Reaction Networks

TMLR Paper6446 Authors

09 Nov 2025 (modified: 09 Dec 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The computational representation of proteins and molecules is a cornerstone of modern biology. However, state-of-the-art models represent these entities in separate and incompatible embedding manifolds, limiting our ability to model the systemic biological processes that depend on their interaction. We introduce ReactEmbed, a lightweight, plug-and-play enhancement module that bridges this gap. Our key invention is a new paradigm that leverages biochemical reaction networks as a definitive source of functional semantics, as co-participation in reactions explicitly defines a functional role. ReactEmbed takes existing, frozen embeddings from state-of-the-art models and aligns them in a unified space through a novel relational learning framework. This framework interprets a weighted reaction graph using a specialized sampling strategy to distill functional relationships. This process yields a cascade of benefits: (1) It enriches the unimodal embeddings, improving their performance on domain-specific tasks. (2) It achieves strong results on a diverse range of cross-domain benchmarks. ReactEmbed provides a practical and powerful method to enhance and unify biological representations, effectively turning disconnected models into a more cohesive, functionally-aware system. The code and database are available for open use.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Yingce_Xia1
Submission Number: 6446
Loading