Abstract: Aspect category sentiment analysis (ACSA) on user reviews is a fundamental and challenging task. It aims to identify all the aspect categories mentioned in the reviews and their corresponding sentiment polarities. In multimodal data, both text and image are highly associated with aspect sentiment. How to utilize detailed multimodal data to further enhance the effectiveness of fine-grained aspect-category sentiment analysis tasks is a highly worthy research issue. Most of existing works integrate text and visual information via attention mechanisms at a single level, neglecting the fact that, information contained in each modality can be divided into different levels. To address these issues, we propose a hierarchical classification modeling approach that jointly models attribute detection tasks and attribute sentiment classification tasks, and introduce a multimodal joint model (MJM) for aspect-category sentiment analysis. The MJM model utilizes text information from the word level and context level, as well as image information from the global level, scene level, and local level, to fully mine detailed features in multimodal scenes. The proposed model is evaluated on the MASAD dataset. It achieves higher performance compared to the baseline models across various evaluation indicators, as well as different ablation experiments.
Loading