Learning to reason iteratively and parallelly for complex visual reasoning

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: visual reasoning; vision-language modules; memory-augmented reasoning; parallel computation
Abstract: Iterative step-by-step computation is beneficial for multi-step reasoning scenarios wherein individual operations need to be computed, stored and recalled dynamically (e.g. when computing the query “determine color of pen to left of the child in red t-shirt sitting at the white table”). Conversely, parallel computation is beneficial for executing operations that are mutually-independent and can be executed simultaneously and not necessarily sequentially (e.g. when counting individual colors for the query: “determine the maximally occuring color amongst all t-shirts”). Accordingly, in this work, we introduce a novel fully neural iterative and parallel reasoning mechanism (IPRM) that combines the benefits of iterative computation with the ability to perform distinct operations simultaneously. Our experiments on various visual question answering and reasoning benchmarks indicate that IPRM exhibits stronger reasoning capabilities and generalization than existing recurrent as well as transformer-based reasoning and vision-language interaction mechanisms while requiring lesser parameters and computation steps. Notably, IPRM achieves state-of-the-art zero-shot performance on the challenging CLEVR-Humans dataset and outperforms prior task-specific methods for the NLVR and CLEVR-CoGen benchmarks. Further, IPRM’s computation can be visualized across reasoning steps aiding interpretability and diagnosis of its reasoning and outputs.
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6399
Loading