Keywords: Large Language Models, Drug Discovery, Protein-Ligand Optimization, Absolute Binding Free Energy, Supervised Fine-Tuning
TL;DR: We make several optimizations to current SOTA methods for LLMs in molecular optimization, focusing on maximizing binding free energy results for generated compounds.
Abstract: Large language models (LLMs) have recently emerged as a promising tool for small-molecule generation in drug discovery. One notable recent state-of-the-art work in this field is MOLLEO, which combines an evolutionary algorithm with an LLM that acts as the operator for making crossovers and mutations on the ligand population. MOLLEO demonstrates strong results on optimizing molecular docking scores, but several aspects of their model are not well suited to real-world drug discovery. We introduce MOLLEO+, an optimized LLM workflow for de novo molecule generation. First, we replace docking with the recently released biomolecular foundation model Boltz-2 as an oracle, which improves the predicted binding affinity of generated molecules using gold-standard molecular dynamics by over 100\%. Second, we incorporate knowledge of existing ligands, which is present in most practical drug discovery scenarios, using ligands from BindingDB instead of ZINC 250k as the starting population for the genetic algorithm. Third, we propose a fine-tuning strategy to better modify existing ligands towards higher activity. We demonstrate the superiority of MOLLEO+ on the receptor tyrosine kinase c-MET and the BRD4 protein, yielding an improvement over state-of-the-art baselines by up to 20\% for Boltz-2 binding affinity.
Supplementary Material: zip
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 18790
Loading