Abstract: Semantic Segmentation in side-scan sonar images (SSS-Seg) is an emerging topic and plays an important function in sonar image interpretation. However, due to the interference of seabed reverberation noise, complex background information, and the unique characteristics of sonar images, the direct application of natural scene image semantic segmentation methods to SSS-Seg fails to achieve satisfactory results. For SSS image semantic segmentation, existing challenges include the inability to effectively distinguish between similar objects, sensitivity to noise, and loss of critical feature details during segmentation. In this article, we propose a novel cross-scale feature interaction network (CSFINet) to address these challenges and achieve semantic segmentation for different underwater objects in SSS images. Specifically, the cross-scale feature selection module filters spatial detail features and abstracts semantic information. The multiscale attention mechanism captures relationships between features at different scales. To address feature loss during transfer, the global information modeling module extracts global contextual features and suppresses background noise. In addition, the branch feature fusion module efficiently fuses valuable features from different levels to improve segmentation accuracy and confidence. To verify the effectiveness of CSFINet, we conducted extensive experiments on the underwater real-scene sonar image dataset. Specifically, our method achieved the mean intersection over union of 82.84$\%$ and the mean pixel accuracy of 89.37$\%$, outperforming several state-of-the-art methods, including convolutional neural networkss-based, Transformer-based, and Mamba-based models.
Loading