Automatic Mapping of Clinical Classification Systems Using Large Language Models

ACL ARR 2025 May Submission3226 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Mapping clinical classification systems, such as the International Classification of Diseases (ICD) is crucial for data analysis but is manually intensive and not scalable. We identified two key issues with the standard automatic methods using transformer-based pre-trained encoders: (1) \emph{linguistic variation} and (2) \emph{varying granular details across ICD versions}. To address these issues, we propose a novel method by leveraging the representational capacity of pre-trained encoders and the reasoning abilities of the large language models (LLMs). For each ICD code, we generate: (1) \emph{hierarchy-augmented} and (2) \emph{LLM-generated} descriptions to capture rich semantic nuances, addressing linguistic variation. Furthermore, we leverage the reasoning ability of the LLM to generate the final maps where the source code has been mapped to a parent code, using a \emph{multiple-choice} style prompts. Empirically, we demonstrate the effectiveness of the proposed method by performing \emph{chapter-wise} mapping between ICD-9-CM (Clinical Modification) and ICD-10-CM (Clinical Modification) and ICD-10-AM (Australian Modification) and ICD-11. Our source code is publicly available at:[github link on camera-ready version].
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: healthcare applications, clinical NLP
Contribution Types: Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 3226
Loading