PED-X-Bench: A Benchmark of Adult-to-Pediatric Extrapolation Decisions in FDA Drug Labels

ICLR 2026 Conference Submission20634 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Biomedical NLP, clinical pharmacology, pediatric drug development, healthcare AI, drug safety
TL;DR: We release PED-X-Bench: a dataset of 737 FDA drug-labels tagged as full, partial, or no adult-to-child extrapolation, each paired with concise efficacy/safety snippets and harmonized pediatric-study metadata.
Abstract: Pediatric clinical trials are often ethically complex, expensive, and infeasible, leading the U.S. FDA to extrapolate adult efficacy and safety data when justified. However, no public resource systematically documents these regulatory decisions. We present PED-X-Bench, the first dataset and benchmark that encodes FDA pediatric extrapolation outcomes as a four-class classification task (Full, Partial, None, Unlabeled). PED-X-Bench comprises 737 drug-label sections (~1M words) from 2007–2024 across diverse therapeutic areas. A two-stage o3-mini prompting pipeline extracted evidence directly from FDA labels, and nine domain experts adjudicated a stratified sample of 135 records (κ = 0.72, macro-F1 = 0.63). For each drug, we provide the gold-standard extrapolation label, concise efficacy and PK/safety summaries, and harmonized study metadata. We benchmark a range of models from metadata-only classifiers to domain-adapted transformers and show that significant headroom remains, underscoring the task’s complexity. Beyond benchmarking, PED-X-Bench enables AI-assisted regulatory decision-support systems and safety-focused applications aimed at accelerating pediatric drug development and reducing off-label use. The dataset card, code, and annotations are attached in the supplementary material and will be released publicly upon acceptance.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 20634
Loading