AMST: Alternating Multimodal Skip Training

Published: 2025, Last Modified: 07 Nov 2025ECML/PKDD (4) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multimodal Learning is one of the many fields in Machine Learning where models leverage the combination of various modalities to enhance learning outcomes. However, modalities may differ in data representation and complexity, which can lead to learning imbalances during the training process. The time it takes for a certain modality to converge during training is a crucial metric to determine modality imbalance. Given differences in convergence rates, different modalities may harmfully interfere with each other’s learning process when simultaneously trained, as is commonly done in a multimodal scenario. To mitigate this negative impact, we propose Alternating Multimodal Skip Training (AMST) where the training frequency is adjusted for each specific modality. This novel method not only improves performance in conventional multimodal models that learn with fused modalities but also enhances alternating models that train each modality separately. Additionally, it outperforms state-of-the-art models while reducing training times.
Loading