BioCG: Constrained Generative Modeling for Biochemical Interaction Prediction

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Constrained Generative Modeling, Drug-Target Interaction (DTI), Drug Discovery, Enzyme-Catalyzed Reactions
Abstract: Predicting interactions between biochemical entities is a core challenge in drug discovery and systems biology, often hindered by limited data and poor generalization to unseen entities. Traditional discriminative models frequently underperform in such settings. We propose BioCG (Biochemical Constrained Generation), a novel framework that reformulates interaction prediction as a constrained sequence generation task. BioCG encodes target entities as unique discrete sequences via Iterative Residual Vector Quantization (I-RVQ) and trains a generative model to produce the sequence of an interacting partner given a query entity. A trie-guided constrained decoding mechanism, built from a catalog of valid target sequences, concentrates the model's learning on the critical distinctions between valid biochemical options, ensuring all outputs correspond to an entity within the pre-defined target catalog. An information-weighted training objective further focuses learning on the most critical decision points. BioCG achieves state-of-the-art (SOTA) performance across diverse tasks, Drug-Target Interaction (DTI), Drug-Drug Interaction (DDI), and Enzyme-Reaction Prediction, especially in data-scarce and cold-start conditions. On the BioSNAP DTI benchmark, for example, BioCG attains an AUC of 89.31\% on unseen proteins, representing a 14.3 percentage point gain over prior SOTA. By directly generating interacting partners from a known biochemical space, BioCG provides a robust and data-efficient solution for in-silico biochemical discovery.
Primary Area: Machine learning for sciences (e.g. climate, health, life sciences, physics, social sciences)
Submission Number: 10050
Loading