Video object segmentation by Multi-Scale Pyramidal Multi-Dimensional LSTM with generated depth context

Qiurui Wang, Chun Yuan

Published: 2016, Last Modified: 15 May 2025ICIP 2016EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Existing deep neural networks, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), typically treat volumetric video data as several single images and deal with one frame at one time, thus the relevance to frames can hardly be fully exploited. Besides, depth context plays the unique role in motion scenes for primates, but is seldom used in no depth label situations. In this paper, we use a more suitable architecture Multi-Scale Pyramidal Multi-Dimensional Long Short Term Memory (MSPMD-LSTM) to reveal the strong relevance within video frames. Furthermore, depth context is extracted and refined to enhance the performance of the model. Experiments demonstrate that our models yield competitive results on Youtube-Objects dataset and Segtrack v2 dataset.