What About the Children? Evaluating and Mitigating Ageism in Medical QA Benchmarks

Published: 06 Mar 2025, Last Modified: 10 Apr 2025ICLR 2025 Workshop AI4CHL PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Full paper
Keywords: question answering, pediatrics, benchmark dataset
TL;DR: This work quantifies, analyses and metigates age bias in medical benchmarks.
Abstract: Despite significant advancements in medical question-answering (QA) systems powered by large language models (LLMs), pediatric medicine remains underrepresented in both research and dataset development. This imbalance stems from fundamental physiological and developmental differences between children and adults, as well as a historical bias favoring adult-centric medical literature. As a result, LLMs trained on existing medical corpora may exhibit age-related biases, leading to suboptimal performance in pediatric contexts. In this work, we systematically assess the extent of pediatric underrepresentation in existing medical QA benchmarks, quantifying both the prevalence and impact of age-related biases. To address these gaps, we introduce a novel evaluation benchmark specifically curated to enhance pediatric medical representation. By incorporating diverse pediatric sources, our dataset provides a more equitable foundation for evaluating LLM performance across different age groups. Our findings highlight the critical need for age-inclusive AI-driven medical tools, aligning with broader efforts in precision medicine and equitable healthcare.
Submission Number: 13
Loading