Adaptive video supervoxel segmentation via energy-guided bottom-up clustering

Xiao Dong, Zhijie Zhong, Wentao Fan, Zhonggui Chen, Xiaohu Guo, Baorong Yang

Published: 01 Jan 2025, Last Modified: 15 May 2025Signal Image Video Process. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Video supervoxel segmentation is a critical technique in computer vision, facilitating accurate object segmentation and boundary detection in video analysis. Current methods often struggle to balance segmentation accuracy with computational efficiency. This paper proposes an adaptive supervoxel segmentation method for videos, guided by energy-driven bottom-up clustering. By treating each pixel as a potential supervoxel and iteratively merging them based on segmentation energy, our method efficiently generates supervoxels of varying sizes that align well with object boundaries. An optimization algorithm further refines the supervoxels, enhancing shape regularity and boundary smoothness. Extensive comparisons with traditional and deep learning-based methods demonstrate the superior performance of our approach in terms of segmentation accuracy, boundary preservation, and efficiency. The proposed method holds promise for practical applications in video analysis and understanding.