DrugImprover: Utilizing Reinforcement Learning for Multi-Objective Alignment in Drug Optimization

Published: 25 Oct 2023, Last Modified: 12 Dec 2023AI4D3 2023 OralEveryoneRevisionsBibTeX
Keywords: Drug optimization, Reinforcement learning, Generative Model, AI alignment
Abstract: Reinforcement learning from human feedback (RLHF) is a method for enhancing the finetuning of large language models (LLMs), leading to notable performance improvements that can also align better with human values. Building upon the inspiration drawn from RLHF, this research delves into the realm of drug optimization. We employ reinforcement learning to finetune a drug optimization model, enhancing the original drug across multiple target objectives, while retains the beneficial chemical properties of the original drug. Our proposal comprises three primary components: (1) DRUGIMPROVER: A framework tailored for improving robustness and efficiency in drug optimization. (2) A novel Advantage-alignment Policy Optimization (APO) with multi-critic guided exploration algorithm for finetuning the objective-oriented properties. (3) A dataset of 1 million compounds, each with OEDOCK docking scores on 5 human proteins associated with cancer cells and 24 proteins from SARS-CoV-2 virus. We conduct a comprehensive evaluation of APO and demonstrate its effectiveness in improving the original drug across multiple properties. Our code and dataset are made public at: https://github.com/Argonne-National-Laboratory/DrugImprover.
Submission Number: 73
Loading