Bio-inspired parameter reuse: Exploiting inter-frame representation similarity with recurrence for accelerating temporal visual processing

Published: 02 Nov 2023, Last Modified: 18 Dec 2023UniReps PosterEveryoneRevisionsBibTeX
Supplementary Material: pdf
Keywords: RNN, bio-inspired architecture, efficient deep neural network
TL;DR: RecSlowFast is a bio-inspired recurrent neural network framework with variable recurrent steps for temporal visual processing, improving computation efficiency and maintaining task accuracy using a single set of parameters.
Abstract: Feedforward neural networks are the dominant approach in current computer vision research. They typically do not incorporate recurrence, which is a prominent feature of biological vision brain circuitry. Inspired by biological findings, we introduce $\textbf{RecSlowFast}$, a recurrent slow-fast framework aimed at showing how recurrence can be useful for temporal visual processing. We perform a variable number of recurrent steps of certain layers in a network receiving input video frames, where each recurrent step is equivalent to a feedforward layer with weights reuse. By harnessing the hidden states extracted from the previous input frame, we reduce the computation cost by executing fewer recurrent steps on temporally correlated consecutive frames, while keeping good task accuracy. The early termination of the recurrence can be dynamically determined through newly introduced criteria based on the distance between hidden states and without using any auxiliary scheduler network. RecSlowFast $\textbf{reuses a single set of parameters}$, unlike previous work which requires one computationally heavy network and one light network, to achieve the speed versus accuracy trade-off. Using a new $\textit{Temporal Pathfinder}$ dataset proposed in this work, we evaluate RecSlowFast on a task to continuously detect the longest evolving contour in a video. The slow-fast inference mechanism speeds up the average frame per second by 279% on this dataset with comparable task accuracy using a desktop GPU. We further demonstrate a similar trend on CamVid, a video semantic segmentation dataset.
Track: Proceedings Track
Submission Number: 32