Extracting language information from clinical notes using large language models

Lingfei Qian, Na Hong, Yujia Zhou, Qianqian Xie, Ruey-Ling Weng, Pitchaya Chairuengjitjaras, Xinsong Du, John Lian, Gad A. Marshall, Suzanne V. Blackley, John Novoa-Laurentiev, Yakeel T. Quiroz, Tae Youn Kim, Nicole Adams, Michelle L. Dossett, Li Zhou, Hua Xu

Published: 01 Jan 2026, Last Modified: 24 Nov 2025International Journal of Medical InformaticsEveryoneRevisionsCC BY-SA 4.0
Abstract: Highlights•Developed an LLM-based NER pipeline to extract nuanced patient language status from clinical notes.•Cross-site validation (YNHH, MIMIC) showed GPT-4o strong zero-shot accuracy and LLaMA3 robust generalization.•Enables scalable, fine-grained language data extraction to support equitable, language-focused healthcare research.
Loading