Abstract: In industrial automation, reliably recognizing the state of partially assembled equipment is crucial for robotic assembly, maintenance, and quality control. However, progress in this area has been hampered by two major challenges: the absence of a comprehensive real-world dataset for complex industrial assemblies, and the persistent domain gap between synthetic training data and real operating conditions. In this paper, we present two key contributions. First, we introduce a large-scale dataset of real industrial assemblies—comprising 90 scenes from 6 diverse pieces of equipment with over 700 parts—providing detailed ground truth for assembly state. Second, we propose a novel two-stage recognition approach that integrates state-of-the-art monocular depth estimation as a preprocessing step, which effectively reduces the synthetic-to-real domain gap to improve recognition performance. Extensive experiments validate our approach, delivering robust 6D pose estimation and part classification in challenging industrial settings. Code, data, and pretrained weights are available at github.com/overlab-kevin/assembly_depth
External IDs:dblp:conf/case/MurrayD25
Loading