Visual Dialog with Explicit Co-Reference Resolution and Personality Consistency

Yunpeng Li, Yue Hu, Xu Bai

Published: 01 Jan 2024, Last Modified: 21 May 2025IJCNN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Visual dialog is a multi-turn on-going conversation, where a critical challenge is how to understand questions with co-reference ambiguities and reply with personality consistency through dialog. Previous works with history encoding inevitably introduce unexpected noise since the whole dialog history is considered via implicit attention. Moreover, there is still no works focus on the role of personality at present in visual dialog. In this paper, we conduct in-depth study of the dialog history and propose two novel modules. Specifically, in Explicit Co-Reference Resolution module, we perform co-reference resolution by modeling it as a sequence tagging task, which helps to locate the relevant history for the current question. In User Personality Modeling module, we make the first attempt to model the user’s interaction style in visual dialog in terms of how detailedly the user answered questions in the previous dialog rounds and generate the user preference score for each answer candidate. Note that both of our proposed modules are model-agnostic so they are applicable in any VisDial model. By applying these two modules in several representative baseline models, we show significant boosts on all the evaluation metrics, achieving new state-of-the-art results on VisDial v1.0 and even outperforming the pre-training models such as VD-BERT [1].