Q-Pain: A Question Answering Dataset to Measure Social Bias in Pain Management

Cécile Logé; Emily Ross; David Yaw Amoah Dadey; Saahil Jain; Adriel Saporta; Andrew Y. Ng; Pranav Rajpurkar

Q-Pain: A Question Answering Dataset to Measure Social Bias in Pain Management

Cécile Logé, Emily Ross, David Yaw Amoah Dadey, Saahil Jain, Adriel Saporta, Andrew Y. Ng, Pranav Rajpurkar

Published: 29 Jul 2021, Last Modified: 24 May 2023NeurIPS 2021 Datasets and Benchmarks Track (Round 1)Readers: Everyone

Keywords: Bias, Pain, Medicine, Healthcare, NLP, QA

TL;DR: We introduce a dataset and accompanying experimental design/analysis framework for assessing bias in medical QA in the context of pain management.

Abstract: Recent advances in Natural Language Processing (NLP), and specifically automated Question Answering (QA) systems, have demonstrated both impressive linguistic fluency and a pernicious tendency to reflect social biases. In this study, we introduce Q-Pain, a dataset for assessing bias in medical QA in the context of pain management, one of the most challenging forms of clinical decision-making. Along with the dataset, we propose a new, rigorous framework, including a sample experimental design, to measure the potential biases present when making treatment decisions. We demonstrate its use by assessing two reference Question-Answering systems, GPT-2 and GPT-3, and find statistically significant differences in treatment between intersectional race-gender subgroups, thus reaffirming the risks posed by AI in medical settings, and the need for datasets like ours to ensure safety before medical AI applications are deployed.

Supplementary Material: zip

URL: https://doi.org/10.13026/2tdv-hj07

8 Replies

Loading