Abstract: Due to the high density of objects and their varying sizes, detecting them accurately and without repetition in such scenarios is more challenging than traditional object detection methods. In this paper, we propose a YOLOv5-based object detection approach equipped with a Transformer-based Head and EM-Merger unit specifically designed for densely packed scenes. We incorporate the transformer architecture into the prediction heads to enable a self-attention mechanism that captures long-term dependencies between the densely packed objects. Additionally, we introduce an EM-Merger unit to resolve redundant object detections. Experimental results on the RebarDSC and SKU110K datasets demonstrate that our method significantly outperforms the baseline approach, achieving new state-of-the-art detection performance.
0 Replies
Loading