BoxCare: A Box Embedding Model for Disease Representation and Diagnosis Prediction in Healthcare Data
Abstract: Diagnosis prediction is becoming crucial to develop healthcare plans for patients based on Electronic Health Records (EHRs). Existing works usually enhance diagnosis prediction via learning accurate disease representation, where many of them try to capture inclusive relations based on the hierarchical structures of existing disease ontologies such as those provided by ICD-9 codes. However, they overlook exclusive relations that can reflect different and complementary perspectives of the ICD-9 structures, and thus fail to accurately represent relations among diseases and ICD-9 codes. To this end, we propose to project disease embeddings and ICD-9 code embeddings into boxes, where a box is an axis-aligned hyperrectangle with a geometric region and two boxes can clearly "include" or "exclude" each other. Upon box embeddings, we further obtain patient embeddings via aggregating the disease representations for diagnosis prediction. Extensive experiments on two real-world EHR datasets show significant performance gains brought by our proposed framework, yielding average improvements of 6.04% for diagnosis prediction over state-of-the-art competitors.
Loading