GuessWhich? Visual dialog with attentive memory network

Lei Zhao, Xinyu Lyu, Jingkuan Song, Lianli Gao

Published: 2021, Last Modified: 11 Jun 2024Pattern Recognit. 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•We use memory network in the cooperative ‘GuessWhich’ game between Q-BOT and A-BOT. It reduces the repetition of the generated dialogs and makes image retrieval efficient.•We propose a novel Attentive Memory Network that adds a fusion model to the memory network. The fusion model can effectively use the manually labeled caption and the image. Thus the generated dialogs and the predicted image representation can be visually grounded.•Experiments conducted on VisDial 1.0 datasets demonstrate that our generated dialogs are natural and precise, and the results exceed the state-of-the-art ‘GuessWhich’ based visual dialog algorithms. Extensive image retrieval experiments prove that our method also can generate more accurate results compared to the benchmarks.