Transformer-Based Multiscale Reconstruction Network for Defect Detection of Infrared Images

Published: 2024, Last Modified: 09 Nov 2025IEEE Trans. Instrum. Meas. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Bottle packaging is extensively used in manufacturing, and inspecting aluminum foil sealing during filling is crucial for ensuring product quality. Traditional machine vision methods based on supervised learning require extensive annotated data, but the scarcity of defective samples hampers the effectiveness of these methods. To address this challenge, unsupervised learning methods have emerged. Despite their potential, these methods often struggle to accurately learn the distribution of normal samples, resulting in higher rates of false positives and negatives. This article proposes an unsupervised learning-based approach for anomaly detection in infrared images. Specifically, we construct a transformer-based multiscale image reconstruction network (TMIRN) that includes a feature extraction module, a feature fusion module, a reconstruction module, a discriminator network, and an anomaly scoring module. By effectively combining Transformer and convolutional neural network (CNN) techniques, the proposed network excels at capturing both global and local semantic information. Its multiscale structure accurately localizes defects of varying sizes and combines image-level and feature-level anomaly scores to mitigate the impact of nonuniform distribution and noise. Experimental results on the infrared image dataset for aluminum foil sealing demonstrate high accuracy in anomaly detection and localization. Furthermore, on the industrial MVTec AD dataset, our TMIRN exhibits superior generalization and detection compared to state-of-the-art reconstruction networks.
Loading