Variational Disentangled Attention for Regularized Visual DialogDownload PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: attention mechanism, latent disentanglement, visual dialog, model regularization
Abstract: One of the most important challenges in a visual dialog is to effectively extract the information from a given image and its historical conversation which are related to the current question. Many studies adopt the soft attention mechanism in different information sources due to its simplicity and ease of optimization. However, some of visual dialogs are observed in a single round. This implies that there is no substantial correlation between individual rounds of questions and answers. This paper presents a unified approach to disentangled attention to deal with context-free visual dialogs. The question is disentangled in latent representation. In particular, an informative regularization is imposed to strengthen the dependence between vision and language by pretraining on the visual question answering before transferring to visual dialog. Importantly, a novel variational attention mechanism is developed and implemented by a local reparameterization trick which carries out a discrete attention to identify the relevant conversations in a visual dialog. A set of experiments are evaluated to illustrate the merits of the proposed attention and regularization schemes for context-free visual dialogs.
One-sentence Summary: A novel variational attention mechanism with latent disentanglement is proposed for regularized visual dialog
Supplementary Material: zip
4 Replies

Loading