Abstract: Highlights•Propose fine-grained multimodal fusion network for sentiment analysis.•Extract fine-grained sentiment representations using fewer denoising tokens.•Perform token-level alignment to facilitate representation learning and fusion.•Generate consistent multimodal representations via correlation-aware fusion.
Loading