Abstract: The imbalanced modality problem in multimodal learning is a vicious phenomenon that leads to the sub-optimization of the modalities due to the gradient conflicts among the modalities and the model’s preference for the easier learning modalities. Recent studies have been dedicated to modulating the gradient from the uni-modal perspective, boosting the learning of the single modality. Nevertheless, they overlook the similarity between multimodal gradient optimization and multi-objective learning, while overemphasizing competition between modalities’ gradients to improve optimization. Therefore, we perceive the imbalanced multimodal optimization as Multi-Objective Optimization, and propose a novel training method: Social Optimum Assisted Gradient Modulation (SOA-GM). In detail, we implemented social Optimum to guide the multimodal model to achieve the trade-off state in which the collaborations of modalities can be maximized. We also proposed envy to reduce the model’s preference to a particular modality. Finally, experiments across multiple datasets indicate our superior and extendable method performance on multimodal learning.
External IDs:dblp:conf/icmcs/HuJSYPYX25
Loading