LyFormer: Context-aware feature fusion for industrial small-object detection

JIN WOO PARK; Jeong Seog Kho; Tae Hwi Yoon; Jaesungkim; Jongpil Jeong

LyFormer: Context-aware feature fusion for industrial small-object detection

JIN WOO PARK, Jeong Seog Kho, Tae Hwi Yoon, Jaesungkim, Jongpil Jeong

18 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Industrial Vision, PCB/SMT Inspection, Feature Fusion, Transformer Block, YOLO Extension, Attention Mechanism, Domain Adaptation, Few-shot Transfer, Class Imbalance, Ablation Study, mAP, Density Score, Counting Accuracy, Robust Preprocessing, Noise-Resilience

Abstract: Accurate detection of small electronic components, such as semiconductors and printed circuit board (PCB) elements, is crucial for maintaining product quality and operational efficiency in surface mount technology (SMT) assembly lines. However, existing YOLO-based detection frameworks, while effective in general scenarios, often struggle with small, visually ambiguous objects under complex backgrounds, variable illumination, and subtle visual distinctions. To address these challenges, we propose \textbf{LyFormer}, a YOLOv8s-based framework that integrates four specialized modules: (1) an Adaptive Multi-level Preprocessing Module (AMPM) for dynamic image preprocessing, (2) a Spatial Relation-aware Image Segmentation Patch (SRISP) for precise object localization, (3) a Fine-grained Cue Extraction Module (FCEM) for amplifying subtle texture details, and (4) a Context-aware Transformer Module (CaT) for integrating global and local contextual information. This modular design significantly improves detection accuracy while maintaining real-time performance. Experiments on real-world SMT production line X-ray images of semiconductor reels demonstrate that LyFormer achieves a mean Average Precision (mAP@0.5) of 0.672, substantially outperforming the baseline YOLOv8s (mAP@0.5: 0.399). These results confirm LyFormer’s accuracy and robustness for small, densely packed components in challenging industrial environments.

Primary Area: applications to computer vision, audio, language, and other modalities

Supplementary Material: zip

Submission Number: 10909

Loading