Pareto-Guided Reinforcement Learning for Multi-Objective ADMET Optimization in Generative Drug Design
Additional Submission Instructions: For the camera-ready version, please include the author names and affiliations, funding disclosures, and acknowledgements.
Track: Track 1: Original Research/Position/Education/Attention Track
Keywords: Pareto-guided reinforcement learning, Multi-objective molecular generation, Predictor-driven ADMET optimization, De novo drug design
Abstract: Multi-objective optimization is fundamental to early drug discovery, where improving one ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) property often degrades others. Existing generative approaches commonly rely on scalarized rewards or descriptor-based objectives, limiting their ability to capture complex pharmacokinetic trade-offs. We present RL-Pareto, a Pareto-guided reinforcement learning framework that directly optimizes predictor-driven ADMET objectives using a transformer-based SELFIES generator and a panel of LightGBM models. A compact reference Pareto set provides a dominance-based reward signal that preserves the structure of trade-offs while encouraging broad exploration. The framework scales flexibly to 1–22 simultaneous objectives without retraining and includes a natural-language interface that enables users to specify goals in plain text. In a benchmark involving simultaneous optimization of solubility and toxicity, RL-Pareto outperforms five strong baselines, PMMG, REINVENT, DrugEx-PCD, DrugEx-PTD, and GMD-MO-LSO, achieving 100% validity and novelty, strong diversity, and the highest hypervolume, reflecting the broadest Pareto-front expansion. RL-Pareto also reaches the best solubility and lowest toxicity extremes. These results highlight RL-Pareto with predictor-driven feedback as a principled, scalable, and practical approach for multi-objective molecular design.
Submission Number: 484
Loading