Learning a Convolutional Bilinear Sparse Code for Natural VideosDownload PDF

Published: 02 Oct 2019, Last Modified: 05 May 2023Real Neurons & Hidden Units @ NeurIPS 2019 PosterReaders: Everyone
TL;DR: We extend bilinear sparse coding and leverage video sequences to learn dynamic filters.
Keywords: Unsupervised Learning, Spatio-Temporal Features, Sparse Coding, Equivariance, Capsules
Abstract: In contrast to the monolithic deep architectures used in deep learning today for computer vision, the visual cortex processes retinal images via two functionally distinct but interconnected networks: the ventral pathway for processing object-related information and the dorsal pathway for processing motion and transformations. Inspired by this cortical division of labor and properties of the magno- and parvocellular systems, we explore an unsupervised approach to feature learning that jointly learns object features and their transformations from natural videos. We propose a new convolutional bilinear sparse coding model that (1) allows independent feature transformations and (2) is capable of processing large images. Our learning procedure leverages smooth motion in natural videos. Our results show that our model can learn groups of features and their transformations directly from natural videos in a completely unsupervised manner. The learned "dynamic filters" exhibit certain equivariance properties, resemble cortical spatiotemporal filters, and capture the statistics of transitions between video frames. Our model can be viewed as one of the first approaches to demonstrate unsupervised learning of primary "capsules" (proposed by Hinton and colleagues for supervised learning) and has strong connections to the Lie group approach to visual perception.
4 Replies

Loading