Hadamard Product for Low-rank Bilinear Pooling

Jin-Hwa Kim, Kyoung-Woon On, Woosang Lim, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang

Oct 17, 2016 (modified: Mar 14, 2017) ICLR 2017 conference submission readers: everyone
  • Abstract: Bilinear models provide rich representations compared with linear models. They have been applied in various visual tasks, such as object recognition, segmentation, and visual question-answering, to get state-of-the-art performances taking advantage of the expanded representations. However, bilinear representations tend to be high-dimensional, limiting the applicability to computationally complex tasks. We propose low-rank bilinear pooling using Hadamard product for an efficient attention mechanism of multimodal learning. We show that our model outperforms compact bilinear pooling in visual question-answering tasks with the state-of-the-art results on the VQA dataset, having a better parsimonious property.
  • TL;DR: A new state-of-the-art on the VQA (real image) dataset using an attention mechanism of low-rank bilinear pooling
  • Conflicts: snu.ac.kr,navercorp.com
  • Keywords: Deep learning, Supervised Learning, Multi-modal learning