Multimodal Aspect Extraction with Region-Aware Alignment Network

Hanqian Wu, Siliang Cheng, Jingjing Wang, Shoushan Li, Lian Chi

2020 (modified: 14 Nov 2021)NLPCC (1) 2020Readers: Everyone

Abstract: Fueled by the rise of social media, documents on these platforms (e.g., Twitter, Weibo) are increasingly multimodal in nature, with images in addition to text. To well automatically analyze the opinion information inside multimodal data, it’s crucial to perform aspect term extraction (ATE) on them. However, until now, the researches focus on multimodal ATE are rare. In this study, we take a step further than previous studies by proposing a Region-aware Alignment Network (RAN) that aligns text with object regions that show in an image for the multimodal ATE task. Experiments on the Twitter dataset showcase the effectiveness of our proposed model. Further researches prove that our model has better performance when extracting emotion polarized aspect terms.

0 Replies