Empirical Study of Social Bias in Medical Question Answering via Large Language Models

Xiao Xiao, Jiaxu Zhao, Terry R. Payne, Meng Fang

Published: 2025, Last Modified: 02 Mar 2026AIiH (1) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Large Language Models (LLMs) are increasingly deployed in medical applications. However, these systems can exhibit biases related to gender, race and professional role, raising significant concerns about their impact on healthcare equity. We propose a systematic framework for evaluating social biases in medical question answering using LLMs. By varying only the role information in prompts across three medical focused subsets of the MMLU benchmark, including College Medicine, Medical Genetics and Professional Medicine, we evaluate multiple LLMs’ performance and quantify bias gaps. Our results highlight the necessity of rigorous bias assessment in medical AI and provide a practical framework for measuring disparities across diverse role dimensions prior to clinical deployment.

External IDs:dblp:conf/aiih/XiaoZPF25