Study on Mask R-CNN with Data Augmentation for Retail Product Detection

Matthew Kuo, Hung-Tse Chan, Chih-Hsien Hsia

Published: 2021, Last Modified: 13 Nov 2024ISPACS 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In deep learning (DL), object detection is a computer vision technology that can be applied in many fields, including face detection and tracking the soccer ball during a match. Previously, when dealing with detecting objects from small retail stores, most studies used Faster R-CNN with ResNet101 as its backbone. However, this method sometimes incorrectly and ineffectively identifies overlapped objects or objects with parts of their edges cut off from the image. This paper use Mask R-CNN with data augmentation (DA) to perform the object detection in retail product recognition. We seek to minimalize the error caused by overlapped objects or adjacent objects having similar colors. In this experiment result, the proposed method on Snacks dataset can achieve mAP of 98.92% with a superior recognition rate in the recognition task compared to the state-of-the-art methods.