Abstract: Surface defect detection in strip steel is a critical task in industrial quality control. However, existing methods struggle with capturing both local details and global context effectively. In this paper, we propose the Global-Local Fusion Network (GLNet) for strip steel surface defect detection, which combines the advantages of VMamba's global feature extraction and CNN's local feature modeling. GLNet employs an encoder-decoder structure, where the encoder consists of two parallel branches: one based on VMamba for capturing global features and the other using ResNet50 for extracting local features. In the decoder, a Global-Local Fusion (GLF) module integrates these features using the Cross Prototype Objective Enhancement (CPOE) and Selective Spatial and Channel Attention (SSCA) modules. The CPOE module facilitates the interaction and fusion between global and local features, while the SSCA module digs the multi-scale information from the global feature through dynamic attention to guide the feature aggregation. Extensive experiments on the ESDIs dataset, demonstrate that GLNet achieves state-of-the-art performance in defect detection, surpassing 13 existing methods in both quantitative and qualitative metrics.
Loading