Drug-ProGO: A Gene Ontology-Enhanced Contrastive Learning Framework for Drug Virtual Screening with Multi-modality Whole-Protein Input

15 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Drug virtual screening; AI for science; Gene ontology; Contrastive Learning
Abstract: Virtual screening plays a crucial role in accelerating early-stage drug discovery by efficiently identifying promising small molecule candidates. While most existing methods depend on known binding pockets, protein-level virtual screening has recently gained attention due to its broader applicability in scenarios where pocket information is incomplete or unavailable. In this work, we propose Drug-ProGO, a Gene Ontology (GO) enhanced contrastive learning framework that integrates GO information during training to enrich protein representations, enabling the model to generalize better to unseen proteins by capturing their functional similarity to known ones. This enables the model to better infer compatibility between novel proteins and small molecules. Our framework supports flexible protein inputs, including sequence, structure, and their combination. In the dual-modality setting, the two modalities are processed independently, and their prediction scores are fused using an uncertainty-aware fusion mechanism without additional trainable parameters. Extensive experiments across four virtual screening benchmarks and input settings demonstrate that incorporating GO knowledge consistently improves performance, highlighting the importance of functional knowledge integration for protein-level virtual screening.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 5966
Loading