everyone
since 13 Oct 2023">EveryoneRevisionsBibTeX
Artificial intelligence (AI) scene understanding systems can benefit from utilizing a large visual field of view (FOV). Some existing systems already employ multiple cameras to extend their FOV, however, increasing image size and quality presents an overwhelming challenge to the acquisition and computing resources for such systems. An effective solution is to sub-sample the FOV, without impairing the model's performance on complex visual tasks. In this paper, we show that a variable sampling scheme, inspired by human vision, remarkably outperforms a uniform sampling scheme by 2% accuracy (65% vs. 63%) in the challenging task of scene visual question answering (VQA), under a limited samples budget (3% of the full resolution baseline). The improvement is achieved without any image scanning, and the variable resolution peaks at an arbitrarily chosen fixed image location. Our study also compared basic visual sub-tasks, in particular image classification and object detection. Comparing the variable and uniform models revealed differences in the representations learned by the different models which yield a consistently improved performance of the variable resolution models. We show that the variable sampling scheme allows the models to benefit in low resolution areas, by propagating information from the finer resolution areas, and at the same time higher resolution areas benefit from contextual information at lower resolution in the periphery. The results show the potential of the biologically-inspired image representation to improve the design of visual acquisition and processing models in future AI-based systems.