Toward equitable documentation : Evaluating ChatGPT's role in identifying and rephrasing stigmatizing language in electronic health records

Zhihong Zhang, Jihye Kim Scroggins, Sarah Harkins, Ismael Ibrahim Hulchafo, Hans Moen, Michele Tadiello, Veronica Barcelona, Maxim Topaz

Published: 01 Jul 2025, Last Modified: 05 Jan 2026Nursing OutlookEveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Stigmatizing language in electronic health records (EHRs) harms clinician and patient relationships, reinforcing health disparities. To assess ChatGPT's ability to reduce stigmatizing language in clinical notes. We analyzed 140 clinical notes and 150 stigmatizing examples from 2 urban hospitals. ChatGPT-4 identified and rephrased stigmatizing language. Identification performance was evaluated using precision, recall, and F1 score, with human expert annotations as the gold standard. Rephrasing quality was rated by experts on a three-point Likert scale for de-stigmatization, faithfulness, conciseness, and clarity. ChatGPT showed poor overall identification (micro-F1 = 0.51) but moderate-to-high performance across individual stigmatizing language categories (micro-F1 = 0.69–0.91). Rephrasing scored 2.7 for de-stigmatization, 2.8 for faithfulness, and 3.0 for conciseness and clarity. Prompt design significantly affected ChatGPT's performance. While ChatGPT has limitations in automatic identification, it can be used to support real-time identification and rephrasing stigmatizing language in EHRs with appropriate prompt design and human oversight.

External IDs:doi:10.1016/j.outlook.2025.102472