Keywords: Semantic ID, Medical Ontology, Representation, Explanability
TL;DR: We propose Semantic Medical ID (SMI), a framework that integrates hierarchical semantics from expert-defined medical ontology into embeddings.
Abstract: Recent advances in generative AI have accelerated the use of language models (LMs) for clinical prediction tasks. However, existing biomedical LMs often struggle to capture clinically meaningful relationships among medical concepts, as they rely solely on data-driven text learning and overlook domain knowledge. In this study, we propose **Semantic Medical ID (SMI)**, a novel representation framework that integrates an expert-defined medical ontology into LM-based embeddings. By leveraging the hierarchical structure of medical ontologies, SMIs generate embeddings that preserve clinical relationships across major disease categories, subcategories, and specific conditions, enhancing interpretability for clinical end users. Experimental results demonstrate that SMI improves predictive accuracy in mortality and readmission tasks. SMI also exhibits greater robustness under cross-hospital distribution shifts, highlighting its effectiveness in producing clinically generalizable representations.
Submission Number: 55
Loading