Abstract: Target-oriented multi-modal sentiment classification (TMSC) aims to identify sentiment polarity towards specific targets by considering multiple modalities, e.g., text and images. However, current methods often ignore spurious correlations within the data, which can cause models to learn irrelevant features that misrepresent the sentiment of targets. To address this issue, we propose a novel Cross-Modal Causal Scheduling framework (CMCS) that prioritizes learning multi-modal features with fewer spurious correlations. Specifically, we first design a Multi-modal Feature Selection model (MFS) that utilizes causal intervention to select relevant features. Second, we construct a Causal cross-Modal Scheduler (CMS) to assess the causal effects of selected features, which further optimize the multi-modal learning process based on these effects. Finally, we formulate the CMS and the multi-modal learning process as a bi-level optimization problem. In the lower optimization, the MFS is updated with the scheduled gradient, while in the upper optimization, the CMS is updated with the implicit gradient. Extensive experiments demonstrate that our method outperforms existing baseline methods on TMSC and can effectively schedule the learning process of multi-modal features based on causal effects.
External IDs:dblp:conf/pkdd/ZhaoLWL25
Loading