Variational Structured Attention Networks for Dense Pixel-Wise Prediction

Guanglei Yang; Paolo Rota; Xavier Alameda-Pineda; Dan Xu; Mingli Ding; Elisa Ricci

Variational Structured Attention Networks for Dense Pixel-Wise Prediction

Guanglei Yang, Paolo Rota, Xavier Alameda-Pineda, Dan Xu, Mingli Ding, Elisa Ricci

28 Sept 2020 (modified: 12 Oct 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: attention network, pixel-wise prediction

Abstract: State-of-the-art performances in dense pixel-wise prediction tasks are obtained with specifically designed convolutional networks. These models often benefit from attention mechanisms that allow better learning of deep representations. Recent works showed the importance of estimating both spatial- and channel-wise attention tensors. In this paper, we propose a unified approach to jointly estimate spatial attention maps and channel attention vectors so as to structure the resulting attention tensor. Moreover, we integrate the estimation of the attention within a probabilistic framework, leading to VarIational STructured Attention networks(VISTA). We implement the inference rules within the neural network, thus allowing for joint learning of the probabilistic and the CNN front-end parameters. Importantly, as demonstrated by our extensive empirical evaluation on six large-scale datasets VISTA outperforms the state-of-the-art in multiple continuous and discrete pixel-level prediction tasks, thus confirming the benefit of structuring the attention tensor and of inferring it within a probabilistic formulation.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/variational-structured-attention-networks-for/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=7iBH8mQg8x

10 Replies

Loading