BoxLM: Unifying Structures and Semantics of Medical Concepts for Diagnosis Prediction in Healthcare

Yanchao Tan; Hang Lv; Yunfei Zhan; Guofang Ma; Bo Xiong; Carl Yang

BoxLM: Unifying Structures and Semantics of Medical Concepts for Diagnosis Prediction in Healthcare

Yanchao Tan, Hang Lv, Yunfei Zhan, Guofang Ma, Bo Xiong, Carl Yang

Published: 01 May 2025, Last Modified: 24 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Language Models (LMs) have advanced diagnosis prediction by leveraging the semantic understanding of medical concepts in Electronic Health Records (EHRs). Despite these advancements, existing LM-based methods often fail to capture the structures of medical concepts (e.g., hierarchy structure from domain knowledge). In this paper, we propose BoxLM, a novel framework that unifies the structures and semantics of medical concepts for diagnosis prediction. Specifically, we propose a structure-semantic fusion mechanism via box embeddings, which integrates both ontology-driven and EHR-driven hierarchical structures with LM-based semantic embeddings, enabling interpretable medical concept representations. Furthermore, in the box-aware diagnosis prediction module, an evolve-and-memorize patient box learning mechanism is proposed to model the temporal dynamics of patient visits, and a volume-based similarity measurement is proposed to enable accurate diagnosis prediction. Extensive experiments demonstrate that BoxLM consistently outperforms state-of-the-art baselines, especially achieving strong performance in few-shot learning scenarios, showcasing its practical utility in real-world clinical settings.

Lay Summary: Can we teach machines to predict a patient’s future health conditions not just by memorizing data, but by truly understanding medical knowledge? Recent language models have made great progress in medical diagnosis by learning the meaning of disease names from health records. But they often miss an important piece: how medical conditions are related—such as which diseases fall under the same category, or how one condition may lead to another. Our work introduces a new approach called **BoxLM**, which combines two types of knowledge: the *meaning* of medical terms and their *structured relationships* drawn from medical ontologies and real hospital data. Instead of treating each disease as a single point, BoxLM represents them as regions with boundaries in space, allowing the model to naturally capture complex relationships like overlap, inclusion, and hierarchy between conditions. BoxLM also tracks how a patient’s health evolves over time, learning patterns that help predict future diagnoses. It performs especially well when only limited patient history is available, making it a promising tool for improving care in data-scarce clinical settings.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Link To Code: https://github.com/Melinda315/BoxLM

Primary Area: Applications->Health / Medicine

Keywords: Diagnosis prediction, Box embeddings, Language model, Hierarchy modeling

Submission Number: 6015

Loading