Abstract: h3>Abstract</h3> <p>Pediatric trials are ethically and logistically difficult, so the U.S. FDA often extrapolates adult data to children when justified. Yet no public resource systematically documents these decisions. We present <b>PED-X-Bench</b>, the first dataset and benchmark that encodes FDA pediatric-extrapolation outcomes as a four-way classification task (<i>Full, Partial, None, Unlabeled</i>). PED-X-Bench contains 737 FDA drug-label sections (≈ 1 M words of source text) for approvals issued 2007–2024 across all therapeutic areas. A two-stage <i>o3-mini</i> prompting pipeline mined full FDA label text; nine domain reviewers then adjudicated a stratified sample of 135 labels yielding an accuracy F1 of 0.74 and 0.63 respectively (inter-annotator κ = 0.678) and spot-checking the remainder. For every drug we release the ground-truth label, concise efficacy and pharmacokinetic/safety summaries, and harmonized study metadata. To showcase utility we release two baseline models: (i) a logistic-regression classifier that uses structured metadata from FDA’s pediatric-labeling dataset, and (ii) a fine-tuned BigBird BERT that ingests full label text. Both base-lines perform modestly, leaving ample headroom for future work. PED-X-Bench enables research on pediatric drug development, clinical NLP and drug safety; dataset card and code are made available here: github.com/tatonetti-lab/PedXBenchhuggingface.co/datasets/apoorvasrinivasan/Ped-X-Bench</p>
External IDs:doi:10.1101/2025.05.22.25328187
Loading