From In Silico to In Vitro: Evaluating Molecule Generative Models for Hit Generation

Nagham Osman; Vittorio Lembo; Giovanni Bottegoni; Laura Toni

From In Silico to In Vitro: Evaluating Molecule Generative Models for Hit Generation

Nagham Osman, Vittorio Lembo, Giovanni Bottegoni, Laura Toni

Published: 24 Sept 2025, Last Modified: 26 Dec 2025NeurIPS2025-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0

Additional Submission Instructions: For the camera-ready version, please include the author names and affiliations, funding disclosures, and acknowledgements.

Track: Track 1: Original Research/Position/Education/Attention Track

Keywords: drug discovery, generative models, hit identification, docking score, evaluation, target-specific hit generation

Abstract: Hit identification is a critical yet resource-intensive step in the drug discovery pipeline, traditionally relying on high-throughput screening of large compound libraries. Despite advancements in virtual screening, these methods remain time-consuming and costly. Recent progress in deep learning has enabled the development of generative models capable of learning complex molecular representations and generating novel compounds \textit{de novo}. However, using ML to replace the entire drug-discovery pipeline is highly challenging. In this work, we rather investigate whether generative models can replace one step of the pipeline: \textit{hit-like} molecule generation. To the best of our knowledge, this is the first study to explicitly frame hit-like molecule generation as a standalone task and empirically test whether generative models can directly support this stage of the drug discovery pipeline. Specifically, we investigate if such models can be trained to generate \textit{hit-like} molecules, enabling direct incorporation into, or even substitution of, traditional hit identification workflows. We propose an evaluation framework tailored to this task, integrating physicochemical, structural, and bioactivity-related criteria within a multi-stage filtering pipeline that defines the \textit{hit-like} chemical space. Two autoregressive and one diffusion-based generative models were benchmarked across various datasets and training settings, with outputs assessed using standard metrics and target-specific docking scores. Our results show that these models can generate valid, diverse, and biologically relevant compounds across multiple targets, with a few selected GSK-3$\beta$ hits synthesized and confirmed active in vitro. We also identify key limitations in current evaluation metrics and available training data.

Submission Number: 426

Loading