Preferential Multi-Objective Bayesian Optimization for Drug Discovery

Tai Dang; Long-Hung Pham; Sang T. Truong; Ari Glenn; Wendy Nguyen; Edward A Pham; Jeffrey S. Glenn; Sanmi Koyejo; Thang Luong

Preferential Multi-Objective Bayesian Optimization for Drug Discovery

Tai Dang, Long-Hung Pham, Sang T. Truong, Ari Glenn, Wendy Nguyen, Edward A Pham, Jeffrey S. Glenn, Sanmi Koyejo, Thang Luong

Published: 06 Mar 2025, Last Modified: 26 Apr 2025GEMEveryoneRevisionsBibTeXCC BY 4.0

Track: Machine learning: computational method and/or computational results

Nature Biotechnology: Yes

Keywords: Drug Discovery, Virtual Screening, Molecular Docking, Bayesian Optimization, Preference Learning, Human Feedback, Diffusion Models

TL;DR: Leveraging Bayesian Preferential Learning and diffusion-based docking models to enhance multi-objective virtual screening by incorporating chemists' preferences.

Abstract: Despite decades of advancements in automated ligand screening, large-scale docking remains resource-intensive and requires post-processing hit selection, a step where chemists manually select a few promising molecules based on their chemical intuition. This creates a major bottleneck in the virtual screening process for drug discovery, demanding experts to repeatedly balance complex trade-offs among drug properties across a vast pool of candidates. To improve the efficiency and reliability of this process, we propose a novel human-centered framework CheapVS that allows chemists to guide the ligand selection process through pairwise preference feedback. Our framework combines preferential multi-objective Bayesian optimization with an efficient diffusion docking model to capture human chemical intuition for improving hit identification. Specifically, on a library of 100K chemical candidates that target EGFR, a cancer-associated protein, CheapVS outperforms state-of-the-art docking methods in identifying drugs within a limited computational budget. Notably, our multi-objective algorithm can recover up to 16 out of 37 known drugs while scanning only 6\% of the library, showcasing its potential to advance drug discovery\footnote{Code and data for these experiments can be found at \url{https://anonymous.4open.science/r/vs-9A83}}.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Presenter: ~Sang_T._Truong1

Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.

Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.

Submission Number: 5

Loading