PED-X-Bench: A Benchmark of Adult-to-Pediatric Extrapolation Decisions in FDA Drug Labels

Apoorva Srinivasan, Jacob Berkowitz, Nadine A. Friedrich, Kevin Tsang, Aditi Kuchi, José Acitores, Michael Zietz, Ryan S. Czarny, Hongyu Liu, Nicholas P. Tatonetti

Published: 23 May 2025, Last Modified: 07 Jan 2026CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: h3>Abstract</h3> <p>Pediatric trials are ethically and logistically difficult, so the U.S. FDA often extrapolates adult data to children when justified. Yet no public resource systematically documents these decisions. We present <b>PED-X-Bench</b>, the first dataset and benchmark that encodes FDA pediatric-extrapolation outcomes as a four-way classification task (<i>Full, Partial, None, Unlabeled</i>). PED-X-Bench contains 737 FDA drug-label sections (≈ 1 M words of source text) for approvals issued 2007–2024 across all therapeutic areas. A two-stage <i>o3-mini</i> prompting pipeline mined full FDA label text; nine domain reviewers then adjudicated a stratified sample of 135 labels yielding an accuracy F1 of 0.74 and 0.63 respectively (inter-annotator κ = 0.678) and spot-checking the remainder. For every drug we release the ground-truth label, concise efficacy and pharmacokinetic/safety summaries, and harmonized study metadata. To showcase utility we release two baseline models: (i) a logistic-regression classifier that uses structured metadata from FDA’s pediatric-labeling dataset, and (ii) a fine-tuned BigBird BERT that ingests full label text. Both base-lines perform modestly, leaving ample headroom for future work. PED-X-Bench enables research on pediatric drug development, clinical NLP and drug safety; dataset card and code are made available here: github.com/tatonetti-lab/PedXBenchhuggingface.co/datasets/apoorvasrinivasan/Ped-X-Bench</p>
Loading