Abstract: Existing superpixel segmentation algorithms mainly focus on natural image with high-quality, while neglecting the inevitable environment constraint in complex scenes. In this paper, we propose an end-to-end frequency domain guided superpixel segmentation network (FSNet) to generate superpixels with sharp boundary adherence for complex scenes by fusing the deep features in spatial and frequency domains. To utilize the frequency domain information of the image, an improved frequency information extractor (IFIE) is proposed to extract the frequency domain information with sharp boundary features. Moreover, considering the over-sharp feature may damage the semantic information of superpixel, we further design a dense hybrid atrous convolution (DHAC) block to preserve semantic information via capturing wider and deeper semantic information in spatial domain. Finally, the extracted deep features in spatial and frequency domains will be fused to generate semantic perceptual superpixels with sharp boundary adherence. Extensive experiments on multiple challenging datasets with complex boundaries demonstrate that our method achieves the state-of-the-art performance both quantitatively and qualitatively, and we further verify the superiority of the proposed method when applied in salient object detection.
Loading