Lightweight Multimodal Defect Detection at the Edge via Cross-Modal Distillation

Published: 01 Jan 2024, Last Modified: 15 May 2025IWQoS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The learning capabilities of single-modality images are often severely limited and fail to meet the requirements of complexity defect detection in industrial settings. For instance, traditional visible light images are susceptible to environmental factors such as lighting and occlusions, while infrared images cannot capture texture details due to their low spatial resolution. Consequently, employing multiple image modalities typically yields better results than relying on a single modality. However, utilizing data from multiple modalities inevitably introduces additional computational costs, posing high hardware demands on edge computing devices, and the need for real-time detection in industrial environments is critical. To address these challenges, we propose a multimodal distillation approach that uses visible and infrared images as inputs to train a complex teacher model, while the student model continues to operate with a single-modal image input. Through knowledge transfer, the student model is enhanced, and model light-weighting is implemented to ensure that it can acquire multi-modal feature information while still meeting real-time performance requirements.
Loading