Abstract: The general recognition of objects, people, actions and scene types has been a core focus of computer vision research. However, now that we have achieved a degree of success in these problems, it is time to define new problems that will spur us to reach the next level of visual intelligence. The development of visual common sense is critical to the development of intelligent agents that can be useful in dynamic, novel environments.
But what exactly is visual common sense? We suggest that the ability to make intelligent assessments of where things might be, when not directly visible, is a critical and ubiquitous capability shared by humans and other intelligent beings, and is a fundamental component of visual common sense. Humans regularly demonstrate the ability to make decisions in the absence of explicit visual cue (Fig. 1). This sort of “intelligent search” is a prominent example of visual common sense, and we believe it represents a skill that will be essential in developing intelligent agents.
Closely related to our work are earlier efforts on incorporating contextual information for visual prediction [5, 10, 11, 9]. We believe a formal benchmark on such capabilities in the most basic forms can be a valuable addition.
0 Replies
Loading