Deep Learning of Invariant Features via Simulated Fixations in Video

Will Y. Zou, Andrew Y. Ng, Shenghuo Zhu, Kai Yu

2012 (modified: 11 Nov 2022)NIPS 2012Readers: Everyone

Abstract: We apply salient feature detection and tracking in videos to simulate ﬁxations and smooth pursuit in human vision. With tracked sequences as input, a hierarchical network of modules learns invariant features using a temporal slowness constraint. The network encodes invariance which are increasingly complex with hierarchy. Although learned from videos, our features are spatial instead of spatial-temporal, and well suited for extracting features from still images. We applied our features to four datasets (COIL-100, Caltech 101, STL-10, PubFig), and observe a consistent improvement of 4% to 5% in classiﬁcation accuracy. With this approach, we achieve state-of-the-art recognition accuracy 61% on STL-10 dataset.

0 Replies