Keywords: reinforcement learning, molecule design, de novo design, ppo, sample-efficient reinforcement learning, chemistry
TL;DR: We investigate a variety of RL algorithms for molecular generation and define new benchmarks (to be released as an OpenAI Gym), finding PPO and a hill-climbing MLE algorithm work best.
Abstract: The design of small molecules with bespoke properties is of central importance to drug discovery. However significant challenges yet remain for computational methods, despite recent advances such as deep recurrent networks and reinforcement learning strategies for sequence generation, and it can be difficult to compare results across different works. This work proposes 19 benchmarks selected by subject experts, expands smaller datasets previously used to approximately 1.1 million training molecules, and explores how to apply new reinforcement learning techniques effectively for molecular design. The benchmarks here, built as OpenAI Gym environments, will be open-sourced to encourage innovation in molecular design algorithms and to enable usage by those without a background in chemistry. Finally, this work explores recent development in reinforcement-learning methods with excellent sample complexity (the A2C and PPO algorithms) and investigates their behavior in molecular generation, demonstrating significant performance gains compared to standard reinforcement learning techniques.
Data: [OpenAI Gym](https://paperswithcode.com/dataset/openai-gym)