Keywords: Virtual screening, Drug discovery
Abstract: Virtual screening (VS) is a critical component of modern drug discovery, yet most existing methods—whether physics-based or deep learning-based—are developed around *holo* protein structures with known ligand-bound pockets. Consequently, their performance degrades significantly on *apo* or predicted structures such as those from AlphaFold2, which are more representative of real-world early-stage drug discovery, where pocket information is often missing. In this paper, we introduce an alignment-and-aggregation framework to enable accurate virtual screening under structural uncertainty. Our method comprises two core components: (1) a tri-modal contrastive learning module that aligns representations of the ligand, the *holo* pocket, and cavities detected from structures, thereby enhancing robustness to pocket localization error; and (2) a cross-attention based adapter for dynamically aggregating candidate binding sites, enabling the model to learn from activity data even without precise pocket annotations. We evaluated our method on a newly curated benchmark of *apo* structures, where it significantly outperforms state-of-the-art methods in blind apo setting, improving the early enrichment factor (EF1\%) from 11.75 to 37.19. Notably, it also maintains strong performance on *holo* structures. These results demonstrate the promise of our approach in advancing first-in-class drug discovery, particularly in scenarios lacking experimentally resolved protein-ligand complexes. Our implementation is publicly available at [https://github.com/Wiley-Z/AANet](https://github.com/Wiley-Z/AANet).
Primary Area: Machine learning for sciences (e.g. climate, health, life sciences, physics, social sciences)
Submission Number: 26314
Loading