Abstract: Video supervoxel segmentation is a critical technique in computer vision, facilitating accurate object segmentation and boundary detection in video analysis. Current methods often struggle to balance segmentation accuracy with computational efficiency. This paper proposes an adaptive supervoxel segmentation method for videos, guided by energy-driven bottom-up clustering. By treating each pixel as a potential supervoxel and iteratively merging them based on segmentation energy, our method efficiently generates supervoxels of varying sizes that align well with object boundaries. An optimization algorithm further refines the supervoxels, enhancing shape regularity and boundary smoothness. Extensive comparisons with traditional and deep learning-based methods demonstrate the superior performance of our approach in terms of segmentation accuracy, boundary preservation, and efficiency. The proposed method holds promise for practical applications in video analysis and understanding.
Loading