Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual RepresentationDownload PDF

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone
Keywords: general-purpose vision, benchmark, visual representation
TL;DR: We propose a comprehensive benchmark for holistic evaluation of general-purpose visual representations, as well as a general framework to mitigate gaps among visual tasks and accommodate arbitrary representations
Abstract: Current computer vision models, unlike the human visual system, cannot yet achieve general-purpose visual understanding. Existing efforts at general vision models are limited to a narrow range of tasks and offer no overarching framework to perform visual tasks holistically. We present a new comprehensive benchmark, General-purpose Visual Understanding Evaluation (G-VUE), covering the full spectrum of visual cognitive abilities with four disjoint functional domains — Perceive, Ground, Reason, and Act. The four domains are embodied in 11 carefully curated tasks, from 3D reconstruction to visual reasoning and manipulation. Along with the benchmark, we provide a general encoder-decoder framework for the tasks in G-VUE, to accommodate arbitrary visual representations on all 11 tasks. With our benchmark and framework, we evaluate 7 typical visual representations and observe that (1) transformer and more data empirically lead to more general-purpose, (2) language plays a significant role in learning versatile visual representation, and (3) correlations indicate a subtle constituent among tasks despite the distinctions, which could be evidence of general-purpose. We argue that instead of pursuing general-purpose vision models by end-to-end multi-task training, it is more reasonable to evaluate and investigate representations, which helps digest emerging pre-trained vision models and hopefully shed light on general intelligence.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Infrastructure (eg, datasets, competitions, implementations, libraries)
12 Replies

Loading