Towards understandable Generative Information Extraction. A case study on making LLMs more understandable EHR profilers

Towards understandable Generative Information Extraction. A case study on making LLMs more understandable EHR profilers

ACL ARR 2025 May Submission4113 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Enhancing the understandability of Information Extraction (IE) outputs can improve its utility and adoption across critical sectors such as healthcare. Unlike comparable tasks like Question Answering (QA) and Summarization, IE remains largely understudied in this context. In this work, we introduce a method that incorporates evidentiality in the form of textual snippets to substantiate the extracted IE outputs (i.e. concepts and relations). We propose a prompt-then-tune pipeline that sequentially extracts IE outputs and corresponding evidence passages from unstructured electronic health records (EHRs). This pipeline supports an ensemble of large language models (LLMs), self-verification, and fine-tuning for generating patient profiles from EHR notes. Beyond evidence-based enrichment, we advocate for semantic-alignment metrics over exact-match metrics, as the latter constrain LLM expressiveness. Our evaluation on three EHR-derived datasets shows that a small-LLM ensemble outperforms stronger standalone LLMs by up to 2.4\% on average across IE tasks. Additionally, we find that iterative prompting and smaller batch sizes not only reduce the complexity of intermediate batch processing but also significantly improve multi-task performance. We further demonstrate that training on synthetic data helps mitigate data scarcity, narrowing, (and in some cases surpassing) the performance gap with larger models.

Paper Type: Long

Research Area: Information Extraction

Research Area Keywords: Information Extraction, Interpretability and Analysis of Models for NLP, NLP Applications, Machine Learning for NLP, Resources and Evaluation

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis

Languages Studied: English

Submission Number: 4113

Loading