PGN: A Polar Geodesic Network for Multimodal Emotion Recognition

ICLR 2026 Conference Submission22329 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: multimodal learning, affective computing, emotion recognition, radial space, polarisation, transformer
TL;DR: A multimodal emotion recognition method based on emotional distribution theory.
Abstract: Multimodal emotion recognition faces semantic ambiguity, significant noise, and cross-modal interference, including missing modalities. Psychological research supports a radial structure of emotions, yet many methods overlook this geometry and accumulate directional noise during fusion. The Polar Geodesic Network maps modality embeddings into a radial space, performs reliability-aware geodesic fusion to preserve circular topology, and then uses a Transformer to refine the fused representation and capture cross-dimensional interactions. Under a unified frozen-backbone protocol, PGN attains 0.6835 Accuracy and 0.6756 Weighted-F1 on MELD, and 0.7340 Accuracy and 0.690 Macro-F1 on IEMOCAP. Ablation results indicate complementary gains from geometry-aware fusion and the subsequent Transformer. These findings show that explicit modelling in radial space improves recognition accuracy and robustness.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 22329
Loading