Bridging the Domain Divide: Supervised vs. Zero-Shot Clinical Section Segmentation from MIMIC-III to Obstetrics

ACL ARR 2025 May Submission2205 Authors

18 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Clinical notes contain vital patient information organized into sections such as "History of Present Illness" and "Medications". Recognizing these sections supports clinical decision-making, yet most existing segmentation approaches rely on supervised models trained on large public corpora (e.g., MIMIC-III), which may not generalize effectively to specialized domains such as obstetrics. In this paper, we advance clinical section segmentation through three key contributions: (1) we introduce a novel, de-identified dataset of obstetrics clinical notes; (2) we systematically evaluate transformer-based supervised models on both in-domain (MIMIC-III) and out-of-domain (obstetrics) data; and (3) we present the first head-to-head comparison with zero-shot large language models (Llama, Mistral, and Qwen). Our results show that while supervised models significantly outperform large language models (LLMs) on in-domain MIMIC-III data, their performance degrades substantially in the out-of-domain setting—where the best zero-shot LLM (Llama 3.3-70B-Instruct) surpasses all supervised baselines, even before applying our hallucination correction step. Once hallucinated section headers are corrected, zero-shot performance improves further, with three out of four LLMs outperforming the best supervised model, demonstrating the viability of zero-shot models for specialized clinical domains. These findings underscore the challenge of transferring models trained on broad public corpora to underexplored clinical subdomains and highlight the strong potential of zero-shot approaches when labeled data is scarce.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: healthcare applications, clinical NLP
Languages Studied: English
Submission Number: 2205
Loading