Unveiling Discrete Clues: Superior Healthcare Predictions for Rare Diseases

Published: 29 Jan 2025, Last Modified: 29 Jan 2025WWW 2025 OralEveryoneRevisionsBibTeXCC BY-NC-ND 4.0
Track: User modeling, personalization and recommendation
Keywords: Discrete modeling, Healthcare prediction, Rare disease
TL;DR: In this paper, we introduce UDC, an innovative framework aimed at enhancing the representation semantics of rare diseases.
Abstract: Accurate healthcare prediction is essential for improving patient outcomes. Existing research primarily leverages sophisticated frameworks like attention or graph neural networks to capture the intricate collaborative (CO) signals inherent in electronic health records. However, prediction for rare diseases remains challenging due to their insufficient co-occurrence. To address this issue, this paper proposes UDC, a novel method that unveils discrete clues to bridge textual knowledge and CO signals within a unified semantic space, thereby enriching the representation semantics of rare diseases. Specifically, we focus on addressing two key sub-problems: (1) acquiring distinguishable discrete codes for precise disease representation and (2) achieving semantic alignment between textual knowledge and the CO signals at the code level. For the first sub-problem, we refine the standard vector quantized (VQ) process to include condition awareness. Additionally, we develop an advanced contrastive learning approach in the decoding stage, leveraging synthetic and mixed domain targets as hard negatives to enrich the perceptibility of the reconstructed representation for downstream tasks. For the second sub-problem, we introduce a novel codebook update strategy using co-teacher distillation. This approach facilitates bidirectional supervision between textual knowledge and CO signals, thereby aligning semantically equivalent information in a shared discrete latent space. Extensive experimentation across two tasks on three datasets showcases that the proposed UDC significantly improves health prediction performance for both rare and common diseases.
Submission Number: 305
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview