Language Controls More Than Top-Down Attention: Modulating Bottom-Up Visual Processing with Referring Expressions

Ozan Arkan Can; Ilker Kesen; Deniz Yuret

Language Controls More Than Top-Down Attention: Modulating Bottom-Up Visual Processing with Referring Expressions

Ozan Arkan Can, Ilker Kesen, Deniz Yuret

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Referring Expression Understanding, Language-Vision Problems, Grounded Language Understanding

Abstract: How to best integrate linguistic and perceptual processing in multimodal tasks is an important open problem. In this work we argue that the common technique of using language to direct visual attention over high-level visual features may not be optimal. Using language throughout the bottom-up visual pathway, going from pixels to high-level features, may be necessary. Our experiments on several English referring expression datasets show significant improvements when language is used to control the filters for bottom-up visual processing in addition to top-down attention.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

One-sentence Summary: We modulate both top-down and bottom-up visual processing with referring expressions.

Reviewed Version (pdf): https://openreview.net/references/pdf?id=HCnZWJgNTb

9 Replies

Loading