Advancing Multimodal Fusion on Heterogeneous Data with Physics-inspired Attention

Advancing Multimodal Fusion on Heterogeneous Data with Physics-inspired Attention

ICLR 2026 Conference Submission20215 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Medical Imaging, Multimodal Fusion, Attention, Deep Learning

Abstract: Multimodal fusion learning paradigm has shown great potential in various fields such as Medicine, Science, Engineering, etc. as it offers a framework to jointly learn from heterogeneous data sources. Such learning paradigm is extremely desirable in medical domain, where AI models are expected to perform various healthcare tasks by learning from disparate data modalities such as imaging, clinical records, and omics. Existing multimodal fusion strategies face several major challenges. First, they struggle to capture complex cross-modal interactions effectively. Second, they are often specialized to specific modalities, i.e., either imaging, omics, etc. hence hindering their adaptability and generalizability to other modalities. Finally, they incur high computational costs, restricting their applicability in resource-constrained healthcare AI. To address these challenges, we propose a novel multimodal learning framework – Efficient Hybrid fusion Physics-informed Attention Learning Network (EHPAL-Net), which is a lightweight and scalable framework that integrates various modalities through novel Efficient Hybrid Fusion (EHF) layers. Each EHF layer initially captures rich modality-specific multi-scale spatial information, followed by a Physics-informed Cross-modal Fusion Attention module to model fine-grained, structure-preserving cross-modal interactions, thereby learning robust complementary shared representations. Furthermore, EHF layers are sequentially learned for each modality making them adaptable and generalizable. Extensive evaluations on 15 public datasets show that EHPAL-Net outperforms leading multimodal fusion methods, boosting performance by up to 3.97% and lowering computational costs by up to 87.8%, ensuring more effective and reliable predictions.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 20215

Loading