Modalities Contribute Unequally: Enhancing Medical Multi-modal Learning through Adaptive Modality Token Re-balancing

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We propose a novel fusion method that adaptively integrates all modalities for medical multi-modal tasks.
Abstract: Medical multi-modal learning requires an effective fusion capability of various heterogeneous modalities. One vital challenge is how to effectively fuse modalities when their data quality varies across different modalities and patients. For example, in the TCGA benchmark, the performance of the same modality can differ between types of cancer. Moreover, data collected at different times, locations, and with varying reagents can introduce inter-modal data quality differences ($i.e.$, $\textbf{Modality Batch Effect}$). In response, we propose ${\textbf{A}}$daptive ${\textbf{M}}$odality Token Re-Balan${\textbf{C}}$ing ($\texttt{AMC}$), a novel top-down dynamic multi-modal fusion approach. The core of $\texttt{AMC}$ is to quantify the significance of each modality (Top) and then fuse them according to the modality importance (Down). Specifically, we access the quality of each input modality and then replace uninformative tokens with inter-modal tokens, accordingly. The more important a modality is, the more informative tokens are retained from that modality. The self-attention will further integrate these mixed tokens to fuse multi-modal knowledge. Comprehensive experiments on both medical and general multi-modal datasets demonstrate the effectiveness and generalizability of $\texttt{AMC}$.
Lay Summary: **Motivation:** Medical research and care often rely on combining different modalities (like medical images, genetic information, and patient records). However, these “multimodal” datasets face a critical challenge: data quality varies widely between different types of data and between patients. Traditional methods struggle to handle these inconsistencies, especially in complex medical scenarios where data types (like genes and pathology scans) are very different and their relevance can change. **Key Insight:** Not all data types (“modalities”) are equally useful for every patient or task. Some modalities might be highly informative for a specific case, while others are unreliable or irrelevant. Instead of treating all modalities the same, we need to dynamically weigh their importance and focus on the most trustworthy information. **Our Solution:** We propose a new approach called AMC to address these challenges. 1. **Assess Modality Importance (“Top” Step):** First, the model evaluates how useful each type of data (e.g., MRI scans vs. blood test results) is for the specific task, like cancer diagnosis. It does this by identifying which data types contain the clearest, most relevant information for the patient or condition at hand. 2. **Replace Unreliable Data with Useful Insights (“Down” Step):** For each data type, the model then filters out uninformative or noisy parts and replaces them with insights from more reliable data types. For example, if a patient’s genetic data is of poor quality, the model might rely more on their imaging data instead. This “re-balancing” ensures the model focuses on the most trustworthy information from all available sources. 3. **Smart Fusion with a Customized Model:** We design a flexible, efficient model (similar to those used in language apps like chatbots) to combine the re-balanced data. This model includes features to improve accuracy and interpretability, making it suitable for medical use where trust and clarity are essential. **Impact:** Tests on real-world medical datasets showed that AMC performs better than traditional methods when data quality varies. Key benefits include: **More Reliable Diagnoses and Predictions:** By focusing on high-quality data, the model makes more accurate decisions, which is critical for treatments and personalized care. **Flexibility Across Medical Fields:** AMC works well for diverse tasks, from Alzheimer’s research to cancer subtype analysis, showing its broad utility in healthcare. **Interpretability:** The model’s ability to quantify which data types are most important helps doctors and researchers understand why it makes certain decisions, building trust in AI-driven medical tools. This approach could pave the way for more robust, adaptable AI in healthcare, improving how we use diverse data to better understand and treat diseases.
Link To Code: https://github.com/PengJieb/amc
Primary Area: Applications->Health / Medicine
Keywords: Multi-modal, Medical
Submission Number: 8709
Loading