Concept-Enhanced Automatic ICD Coding using Large Language Models

Md Shahrar Fatemi, Zhan Shi, Joel Saltz, Klaus Mueller, Tengfei Ma

Published: 27 Nov 2025, Last Modified: 09 Dec 2025ML4H 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: ICD Coding, Medical Concepts, Large Language Models (LLMs).
TL;DR: Concept-Enhanced Coding (CEC) is a two-stage framework that leverages clinically meaningful concepts, organized at multiple levels, to guide Large Language Models in ICD coding, achieving state-of-the-art performance and improved interpretability.
Track: Proceedings
Abstract: Automatic ICD coding is a task which assigns disease or procedure codes to clinical notes from patients’ electronic health record data. Large language models have been explored for this task, but none of the existing approaches have shown stronger performance than traditional deep learning models due to limited ability to model concepts. Existing methods for ICD coding often utilize the code descriptions or synonyms to enhance performance. In this paper, we propose to use concepts to expand the label space. Utilizing the hierarchy of ICD codes, we construct concepts associated with the codes at different levels, and employ fine-tuned large language models to obtain concept scores, which are then used for code prediction. Experiments conducted on MIMIC-III-50, and MIMIC-III-rare50 datasets demonstrate that our models achieve excellent performance and largely outperform previous state-of-the-art models. While the current evaluation is constrained in scope and computational tractability, the results provide strong evidence for the potential of concept-driven LLM frameworks to advance automated medical coding.
General Area: Models and Methods
Specific Subject Areas: Natural Language Processing, Explainability & Interpretability
Supplementary Material: zip
Data And Code Availability: Yes
Ethics Board Approval: No
Entered Conflicts: I confirm the above
Anonymity: I confirm the above
Submission Number: 153
Loading