Mapping Extracted Free-Text Primary Diagnoses to ICD-10 and SNOMED-CT Using SciSpacy: A Performance Evaluation

Published: 19 Aug 2025, Last Modified: 12 Oct 2025BHI 2025EveryoneRevisionsBibTeXCC BY 4.0
Confirmation: I have read and agree with the IEEE BHI 2025 conference submission's policy on behalf of myself and my co-authors.
Keywords: Natural language processing, scispaCy, ICD-10, SNOMED, MIMIC-IV, Clinical coding, Diagnosis mapping, Medical concept extraction
TL;DR: We built a scalable NLP pipeline using scispaCy to map diagnoses from MIMIC-IV notes to standard codes. It achieved 98.1% UMLS coverage and mapped 80.1% of patients to ICD-10, offering transparent, efficient, low-bias alternatives to black-box LLMs.
Abstract: Accurate extraction and standardization of clinical diagnoses from unstructured electronic health records (EHRs) remain a critical challenge in healthcare data science. This study is among the first to evaluate the performance of scispaCy for mapping diagnosis concepts extracted from MIMIC-IV clinical notes to standardized medical codes. Our natural language processing (NLP) pipeline leverages scispaCy to map diagnosis concepts to Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs) and crosswalk these to ICD-10 and SNOMED-CT codes. Applied to the MIMIC-IV dataset, the pipeline demonstrated robust coverage, successfully mapping 94.1% of extracted diagnosis concepts to UMLS CUIs across 98% of patients, with 80.3% of patients having CUIs mapped to ICD-10. Exact ICD-10 code matches between model output and MIMIC-IV diagnosis records were observed in 58.3% of patients, while a hierarchical category level roll-up comparison improved matching to 83.1%, reflecting clinical coding complexities. The pipeline’s reliance on UMLS CUIs offers versatility across coding standards, and its design supports integration with existing EHR systems using standard hardware, enhancing accessibility. Our approach poses a lower risk of hallucination and reduces gender and racial bias compared to large language models, as it relies on structured vocabularies rather than generative deep learning. This work highlights the promise of combining rule-based and statistical NLP methods for scalable, transparent, and clinically relevant diagnosis mapping, with the potential to improve research applications and clinical decision support systems.
Track: 4. Clinical Informatics
Registration Id: ZMNP8SGJ6MF
Submission Number: 300
Loading