Toward equitable documentation : Evaluating ChatGPT's role in identifying and rephrasing stigmatizing language in electronic health records
Abstract: Stigmatizing language in electronic health records (EHRs) harms clinician and patient relationships, reinforcing health disparities. To assess ChatGPT's ability to reduce stigmatizing language in clinical notes. We analyzed 140 clinical notes and 150 stigmatizing examples from 2 urban hospitals. ChatGPT-4 identified and rephrased stigmatizing language. Identification performance was evaluated using precision, recall, and F1 score, with human expert annotations as the gold standard. Rephrasing quality was rated by experts on a three-point Likert scale for de-stigmatization, faithfulness, conciseness, and clarity. ChatGPT showed poor overall identification (micro-F1 = 0.51) but moderate-to-high performance across individual stigmatizing language categories (micro-F1 = 0.69–0.91). Rephrasing scored 2.7 for de-stigmatization, 2.8 for faithfulness, and 3.0 for conciseness and clarity. Prompt design significantly affected ChatGPT's performance. While ChatGPT has limitations in automatic identification, it can be used to support real-time identification and rephrasing stigmatizing language in EHRs with appropriate prompt design and human oversight.
External IDs:doi:10.1016/j.outlook.2025.102472
Loading