MFF-SDD: A Bidirectional Guidance and Multiscale Multimodal Fusion Model for Small Defect Detection in Industrial Films

Huiyan Wang, Ruihao Peng, Ming Ying, Fashuai Li, Jiuyi Zhang, Xiaolan Li, Yan Tian, Guofeng Zhang

Published: 01 Jan 2025, Last Modified: 04 Nov 2025IEEE Transactions on Industrial InformaticsEveryoneRevisionsCC BY-SA 4.0
Abstract: Detecting small defects in industrial biaxially oriented polypropylene (BOPP) films is challenging due to their limited visual cues, low contrast, and high similarity to background textures. Existing deep learning-based methods, which rely solely on visual data, often suffer from high miss rates and poor classification accuracy in such complex scenarios. To address these limitations, we propose the multimodal fusion-based film small defect detection (MFF-SDD) model, specifically designed for small defect detection in BOPP film production. Unlike conventional language-image models that neglect the bidirectional interaction between modalities, MFF-SDD introduces the bidirectional guidance multimodal cross-fusion module, which enhances visual–textual integration through mutual guidance and attention mechanisms, enabling more effective foreground focus and background suppression. Furthermore, the text and image cross-modality neighbors multiscale fusion module employs multiscale cross-modal fusion to preserve fine-grained details and exploit complementary features from both modalities, improving detection accuracy and reducing misclassification of small defects. We also present the film small defect (FSD) dataset, comprising 10 385 annotated bounding boxes across seven defect categories. Experimental results show that MFF-SDD surpasses state-of-the-art methods by 2.31% in average precision and 3.75% in recall on the FSD dataset and achieves leading performance on public benchmarks, such as PASCAL VOC and TinyPerson. These findings demonstrate the effectiveness and robustness of our approach in multimodal small defect detection. Our dataset and code will be made publicly available upon acceptance of this article for publication.
Loading