DMPKBench: A Multi-Modal Benchmark for Evaluating LLMs and Agents in Drug Discovery DMPK Tasks‌

Published: 24 Sept 2025, Last Modified: 26 Dec 2025NeurIPS2025-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Additional Submission Instructions: For the camera-ready version, please include the author names and affiliations, funding disclosures, and acknowledgements.
Track: Track 1: Original Research/Position/Education/Attention Track
Keywords: LLM, Agent, Multi-modal Benchmark, Drug Discovery, DMPK
TL;DR: DMPKBench is a high-quality comprehansive benchmark designed to evaluate LLMs and multi-agent performance in drug discovery DMPK-related tasks.
Abstract: With the rapid progress of large language models (LLMs) and multi-agent systems, there is an increasing demand for fair and comprehensive evaluation of their capacity to address complex tasks in specialized scientific domains. Drug metabolism and pharmacokinetics (DMPK) constitutes a critical stage in drug discovery, requiring interdisciplinary reasoning and integration of diverse knowledge. Thus, we constructed DMPKBench, a comprehansive benchmark designed to evaluate LLMs and multi-agent performance in DMPK-related tasks. Grounded in real-world drug development pipeline, DMPKBench covers five core competencies essential to domain experts: experimental design and troubleshooting, interpretation of experimental results, ADMET multi-parameter optimization, pharmacokinetic (PK) modeling and simulation, and preclinical-to-clinical PK translation to the human body. The DMPKBench offers over 120k question-answer pairs, with four dimensions quality-controlled by specialists and one validated through experimental evidence. We comprehensively evaluated five models across major DMPKBench modules, revealing accuracy ranges from 11\% to 89\%. A Significant performance gap is observed, with models excelling in knowledge-driven benchmarks but struggling in multi-modal tasks like drug structure understanding, real-world DMPK data table and PK curve analysis, and multi-step quantitative reasoning. Overall, DMPKBench offers a high-quality, domain-specific foundation for advancing LLMs and multi-agent systems in drug discovery and is publicly available at: https://github.com/GHDDI-AILab/DMPKBench.
Submission Number: 265
Loading