Hybrid Early Fusion for Multi-Modal Biomedical Representations

Published: 02 Nov 2023, Last Modified: 18 Dec 2023UniReps PosterEveryoneRevisionsBibTeX
Keywords: multimodal, representation learning, deep learning, biomedical data
TL;DR: Novel fusion method to a) preserve structural information, b) capture cross-modal interactions, c) handle missing modalities, and d) enable easy model inspection
Abstract: Technological advances in medical data collection such as high-resolution histopathology and high-throughput genomic sequencing have contributed to the rising requirement for multi-modal biomedical modelling, specifically for image, tabular, and graph data. Most multi-modal deep learning approaches use modality-specific architectures that are trained separately and cannot capture the crucial cross-modal information that motivates the integration of different data sources. This paper presents the Hybrid Early-fusion Attention Learning Network (HEALNet) – a flexible multi-modal fusion architecture, which a) preserves modality-specific structural information, b) captures the cross-modal interactions and structural information in a shared latent space, c) can effectively handle missing modalities during training and inference, and d) enables intuitive model inspection by learning on the raw data input instead of opaque embeddings. We conduct multi-modal survival analysis on Whole Slide Images and Multi-omic data on four cancer cohorts of The Cancer Genome Atlas (TCGA). HEALNet achieves state-of-the-art performance, substantially improving over both uni-modal and recent multi-modal baselines, whilst being robust in scenarios with missing modalities.
Track: Extended Abstract Track
Submission Number: 9