A transformer-based mask R-CNN for tomato detection and segmentation

Published: 01 Jan 2023, Last Modified: 15 Oct 2024J. Intell. Fuzzy Syst. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Fruit detection is essential for harvesting robot platforms. However, complicated environmental attributes such as illumination variation and occlusion have made fruit detection a challenging task. In this study, a Transformer-based mask region-based convolution neural network (R-CNN) model for tomato detection and segmentation is proposed to address these difficulties. Swin Transformer is used as the backbone network for better feature extraction. Multi-scale training techniques are shown to yield significant performance gains. Apart from accurately detecting and segmenting tomatoes, the method effectively identifies tomato cultivars (normal-size and cherry tomatoes) and tomato maturity stages (fully-ripened, half-ripened, and green). Compared with existing work, the method has the best detection and segmentation performance for these tomatoes, with mean average precision (mAP) results of 89.4% and 89.2%, respectively.
Loading