Abstract: In this paper, we propose a multi-loss network (PFME) based on progressive fusion and mixtures of experts for multimodal sentiment analysis. PFME comprises a progressive attention fusion (PAF) module and a module based on mixtures of attention experts (MAE). The PAF module leverages a learnable shared query to extract modal-shared representations through cyclic iterations. In each iteration, the query continuously reinforces the sentiment dynamics of multilevel features under double cross-attention. The MAE module employs multiple attention experts to complement various aspects of intra-modal information. A router assigns multi-level semantic features to one of the experts, resulting in an exclusive routing line beneath several stacks that generates a modality-specific representation. In particular, we develop three loss functions to improve the performance of these two modules. At first, we exploit contrastive loss on high-level semantic features between modalities to deepen inter-modal associations; next, we utilize orthogonal loss in the PAF module to preserve the shared query paradigm invariant; and finally, we deploy balance loss on the MAE module to equalize assignment probability across experts. Extensive experiments on the CMU-MOSI and CMU-MOSEI datasets show that PFME achieves state-of-the-art performance.
Loading