Fusing Echocardiography Images and Medical Records for Continuous Patient Stratification

Published: 27 Apr 2024, Last Modified: 15 May 2024MIDL 2024 Short PapersEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal, contrastive learning, transformer, foundation model, cardiac ultrasound, health records, hypertension
Abstract: Deep learning now enables automatic and robust extraction of cardiac function descriptors from echocardiographic sequences, such as ejection fraction or strain. These descriptors provide fine-grained information that physicians consider, in conjunction with global variables from the clinical record, to assess patients' condition. Drawing on novel transformer models applied to tabular data (e.g. variables from electronic health records), we propose a method that considers descriptors extracted from medical records and echocardiograms to learn a representation of hypertension, a difficult-to-characterize and highly prevalent cardiovascular pathology. Our method first embeds each descriptor separately using modality-specific approaches. These embeddings are fed as tokens to a transformer encoder, which combines them into a unified representation of the patient to predict a clinical rating. This task is formulated as an ordinal classification to enforce a pathological continuum in the representation space. We observe trends along this continuum for a cohort of 239 hypertensive patients to describe the gradual effects of hypertension on cardiac function descriptors. Our analysis shows that i) pretrained weights from a foundation model allow to reach good performance (83% accuracy) even with limited data ($<$ 200 training samples), ii) trends across the population are reproducible between trainings, and iii) for descriptors known to interact with hypertension, patterns are consistent with prior physiological knowledge.
Submission Number: 6
Loading