Abstract: Multi-sensory data, which exhibits complex relationships among modalities and temporal interactions, contains richer and more complex emotional representations for sentiment analysis. Yet, the effective integration of modalities remains a major challenge in the Multimodal Sentiment Analysis (MSA) task. We present a generalized model named Synesthesia Transformer with Contrastive learning (STC), which applies a synesthesia attention module enabling other modalities to guide the training of the input modality. It obtains a more natural and effective fusion and achieves competitive results on two widely used benchmarks CMU-MOSEI and CMU-MOSI.
0 Replies
Loading