Learning to Segment Video Object With Accurate Boundaries

Jingchun Cheng, Yuhui Yuan, Yali Li, Jingdong Wang, Shengjin Wang

2021 (modified: 08 Nov 2022)IEEE Trans. Multim. 2021Readers: Everyone

Abstract: Video object segmentation has attracted considerable research interest these years. Top-performing video object segmentation methods mainly rely on fully convolutional neural networks which are specifically trained for predicting high-performance masks, resulting in a lack of preciseness in boundary details. This paper tackles the problem of predicting both mask-accurate and boundary-precise segmentation masks in videos. To solve this problem, we propose a simple and efficient network structure: the Mask-boundAry-Consistent Network ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MAC-Net ). The <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MAC-Net is an end-to-end fully convolutional network, where both mask and boundaries are jointly optimized during training, enabling it to predict masks along with accurate boundaries. An inner-net boundary-computing module is incorporated in the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MAC-Net for producing spontaneously mask-consistent boundaries. We analyze the influence of parameter settings, network constructions of the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MAC-Net , and compare with state-of-the-art algorithms on three widely-adopted datasets. Experimental results show that the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MAC-Net achieves state-of-the-art performance, demonstrating the effectiveness of its mask-boundary-consistent network structure. We also propose that the boundary module in <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MAC-Net has high compatibility, and can be easily adapted to other segmentation-related techniques.

0 Replies