Progressive Attention Networks for Visual Attribute Prediction

Paul Hongsuck Seo; Zhe Lin; Scott Cohen; Xiaohui Shen; Bohyung Han

Progressive Attention Networks for Visual Attribute Prediction

Paul Hongsuck Seo, Zhe Lin, Scott Cohen, Xiaohui Shen, Bohyung Han

06 Jul 2025 (modified: 22 Jun 2025)Submitted to ICLR 2017Readers: Everyone

Abstract: We propose a novel attention model which can accurately attend to target objects of various scales and shapes in images. The model is trained to gradually suppress irrelevant regions in an input image via a progressive attentive process over multiple layers of a convolutional neural network. The attentive process in each layer determines whether to pass or suppress features at certain spatial locations for use in the next layer. We further employ local contexts to estimate attention probability at each location since it is difficult to infer accurate attention by observing a feature vector from a single location only. The experiments on synthetic and real datasets show that the proposed attention network outperforms traditional attention methods in visual attribute prediction tasks.

TL;DR: Progressive attention model that accurately attends to the target objects of various scales and shapes through multiple CNN layers.

Conflicts: postech.ac.kr, adobe.com

Keywords: Deep learning, Computer vision, Multi-modal learning

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/progressive-attention-networks-for-visual/code)

15 Replies

Loading