An Investigation of Deep Visual Architectures Based on Preprocess Using the Retinal Transform

Álvaro Mendes Samagaio, Jan Paul Siebert

Published: 2020, Last Modified: 13 Nov 2024ECCV Workshops (1) 2020EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This work investigates the utility of a biologically motivated software retina model to pre-process and compress visual information prior to training and classification by means of a deep convolutional neural networks (CNNs) in the context of object recognition in robotics and egocentric perception. We captured a dataset of video clips in a standard office environment by means of a hand-held high-resolution digital camera using uncontrolled illumination. Individual video sequences for each of 20 objects were captured over the observable view hemisphere for each object and several sequences were captured per object to serve training and validation within an object recognition task. A key objective of this project is to investigate appropriate network architectures for processing retina transformed input images and in particular to determine the utility of spatio-temporal CNNs versus simple feed-forward CNNs. A number of different CNN architectures were devised and compared in their classification performance accordingly. The project demonstrated that the image classification task could be conducted with an accuracy exceeding 98% under varying lighting conditions when the object was viewed from distances similar to that when trained.