Extracting language information from clinical notes using large language models

Published: 01 Jan 2026, Last Modified: 04 Nov 2025Int. J. Medical Informatics 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Developed an LLM-based NER pipeline to extract nuanced patient language status from clinical notes.•Cross-site validation (YNHH, MIMIC) showed GPT-4o strong zero-shot accuracy and LLaMA3 robust generalization.•Enables scalable, fine-grained language data extraction to support equitable, language-focused healthcare research.
Loading