Integrating Conditional WGAN-GP and Reinforcement Learning for Automated De Novo Drug Design

Samarth Mishra; Sowmiya B; Dr. Pushpalatha M

Integrating Conditional WGAN-GP and Reinforcement Learning for Automated De Novo Drug Design

Samarth Mishra, Sowmiya B, Dr. Pushpalatha M

20 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: De novo drug discovery, conditional GANs, WGAN-GP, policy gradient reinforcement learning, molecular generation, synthetic accessibility, pharmacological filters

TL;DR: cWGAN-GP+RL conditioned on QED, logP and Lipinski biases SELFIES-based molecule generation toward drug-like space. On ZINC, it notably boosts validity, uniqueness and QED vs baselines too, yielding a stable, interpretable pipeline for de novo design.

Abstract: The process of drug discovery still remains a time consuming and costly process, consisting of repetitive experimental screening and molecular simulations. The vast chemical space estimated to contain 10 to the sixtieth power possible compounds remains challenging to navigate efficiently. While previously generative adversarial networks (GANs) have shown promise in de novo molecular design, they are often limited due to training instability, mode collapse and insufficient control over the pharmacokinetic properties. In this work, we propose an integrated framework that combines a conditional Wasserstein GAN with gradient penalty (cWGAN-GP) and policy-gradient reinforcement learning (RL) to guide molecule generation toward drug-like properties. The model is conditioned on key criteria such as the quantitative estimate of drug-likeness (QED), octanol-water partition coefficient (logP), and Lipinski’s Rule of Five. Reinforcement learning is used to further refine the output distribution by rewarding optimal properties in molecules. Additionally, we have also introduced an explainability module to clarify the structure–property relationships, enabling rational and better selection. On the ZINC dataset, our approach yields a six percent increase in chemical validity and a ten percent boost in novelty over the baseline models, demonstrating improved stability, diversity and biological relevance. These results highlight our framework’s potential to accelerate and reduce the cost of early-stage drug discovery.

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 25619

Loading