Recurrent Temporal Deep Field for Semantic Video LabelingOpen Website

2016 (modified: 11 Nov 2022)ECCV (5) 2016Readers: Everyone
Abstract: This paper specifies a new deep architecture, called Recurrent Temporal Deep Field (RTDF), for semantic video labeling. RTDF is a conditional random field (CRF) that combines a deconvolution neural network (DeconvNet) and a recurrent temporal restricted Boltzmann machine (RTRBM). DeconvNet is grounded onto pixels of a new frame for estimating the unary potential of the CRF. RTRBM estimates a high-order potential of the CRF by capturing long-term spatiotemporal dependencies of pixel labels that RTDF has already predicted in previous frames. We derive a mean-field inference algorithm to jointly predict all latent variables in both RTRBM and CRF. We also conduct end-to-end joint training of all DeconvNet, RTRBM, and CRF parameters. The joint learning and inference integrate the three components into a unified deep model – RTDF. Our evaluation on the benchmark Youtube Face Database (YFDB) and Cambridge-driving Labeled Video Database (Camvid) demonstrates that RTDF outperforms the state of the art both qualitatively and quantitatively.
0 Replies

Loading