Abstract: Insufficient data and data lacking the diversity to represent the general public is a common challenge when modelling diagnosis prediction. We consider a much larger and more diverse database of commercial Electronic Health Records than what is prevalent in the literature. We formulate a simplified version of diagnosis prediction that focuses on major developments in medical histories of patients. To this end, we leverage Auto-Regressive Self-Attention models that have seen promising applications in language modelling and extend them to incorporate ontological representations of medical codes. Additionally, we include time-intervals between diagnoses into the attention calculation. We evaluate models and baselines at different levels of diagnostic granularity and our results suggest that using very detailed clinical classifications does not significantly degrade performance, possibly allowing their use in practice. Our model outperforms all baselines and we suggest that leveraging the ontology for generating diagnosis representations is mostly helpful for rare diagnoses.
0 Replies
Loading