Contrastive-based YOLOv7 for personal protective equipment detection

Hussein Samma, Sadam Al-Azani, Hamzah Luqman, Motaz Alfarraj

Published: 2024, Last Modified: 13 Nov 2024Neural Comput. Appl. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: You only look once (YOLO) is a state-of-the-art object detection model which has a novel architecture that balances model complexity with the inference time. Among YOLO versions, YOLOv7 has a lightweight backbone network called E-ELAN that allows it to learn more efficiently without affecting the gradient path. However, YOLOv7 models face classification difficulties when dealing with classes that have a similar shape and texture like personal protective equipment (PPE). In other words, the Glass versus NoGlass PPE objects almost appear similar when the image is captured at a distance. To mitigate this issue and further improve the classification performance of YOLOv7, a modified version called the contrastive-based model is introduced in this work. The basic concept is that a contrast loss branch function has been added, which assists the YOLOv7 model in differentiating and pushing instances from different classes in the embedding space. To validate the effectiveness of the implemented contrastive-based YOLO, it has been evaluated on two different datasets which are CHV and our own indoor collected dataset named JRCAI. The dataset contains 12 different types of PPE classes. Notably, we have annotated both datasets for the studied 12 PPE objects. The experimental results showed that the proposed model outperforms the standard YOLOv7 model by 2% in mAP@0.5 measure. Furthermore, the proposed model outperformed other YOLO variants as well as cutting-edge object detection models such as YOLOv8, Faster-RCNN, and DAB-DETR.