DrugNav: A Benchmark Dataset of Expert Trajectories for Developing and Evaluating LLM Agents in Multi-Step Drug Discovery

Published: 24 Sept 2025, Last Modified: 15 Oct 2025NeurIPS2025-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Track 2: Dataset Proposal Competition
Keywords: LLM Agents, Drug Discovery, Tool Calling, Reinforcement Learning, Agentic AI
Abstract: The potential of Large Language Models (LLMs) in drug discovery is constrained by inadequate benchmarks. Current benchmarks, focused on single-tool calls in general scientific domains, fail to capture the complex, multi-step reasoning and execution required for pharmaceutical R&D. To address this critical gap, we introduce DrugNav, a new, open dataset of expert tool-calling trajectories tailored for drug discovery. DrugNav consists of high-fidelity, sequential tool interactions that solve complex queries, from target identification to lead optimization. Each trajectory documents the complete workflow of tool calls, intermediate reasoning, and outcomes, providing the necessary data to train agentic models on complex, multi-tool tasks. By providing a curated set of successful solution pathways, DrugNav is specifically designed to facilitate end-to-end, tool-integrated Reinforcement Learning for LLM agents. Our work will accelerate the development of capable autonomous systems, significantly reducing the time and cost of drug discovery and advancing AI-driven science.
Submission Number: 401
Loading