---
license: cc
---
These datasets are for testing bias in LLMs. 
They were curated by filtering out questions from MedQA which were deemed to have an impact factor of 1(no change) on a scale from 1 - 5,
when asking GPT4 if changing the gender/ethnicity would impact diagnosis. 
The first 500 questions from the Ethnicity and Gender dataset are from the test set with the remaining questions being from the train set.

