Abstract: Video object segmentation has attracted considerable research interest these years. Top-performing video object segmentation methods mainly rely on fully convolutional neural networks which are specifically trained for predicting high-performance masks, resulting in a lack of preciseness in boundary details. This paper tackles the problem of predicting both mask-accurate and boundary-precise segmentation masks in videos. To solve this problem, we propose a simple and efficient network structure: the Mask-boundAry-Consistent Network ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MAC-Net</i> ). The <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MAC-Net</i> is an end-to-end fully convolutional network, where both mask and boundaries are jointly optimized during training, enabling it to predict masks along with accurate boundaries. An inner-net boundary-computing module is incorporated in the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MAC-Net</i> for producing spontaneously mask-consistent boundaries. We analyze the influence of parameter settings, network constructions of the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MAC-Net</i> , and compare with state-of-the-art algorithms on three widely-adopted datasets. Experimental results show that the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MAC-Net</i> achieves state-of-the-art performance, demonstrating the effectiveness of its mask-boundary-consistent network structure. We also propose that the boundary module in <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MAC-Net</i> has high compatibility, and can be easily adapted to other segmentation-related techniques.
0 Replies
Loading