Keywords: Precision Agriculture, Selective Harvesting, Computer Vision, Anomaly Detection
TL;DR: Exploring the representation learning from unlabeled data to build a visual detector of anomalous fruits
Abstract: Recently, self-supervised learning methods have been proposed to learn a useful representation for visual detection of anomalous, unhealthy crops while a neural network classifies augmented images of normal instances, which are relatively easy-to-obtain. Their pipelines are largely designed within the one-class classification paradigm, in which training samples are all of normal (negative) class, considering the severe scarcity of anomalous (positive) observations in realistic scenarios. In this paper, we study whether this “homogeneity” of training set is necessary to boost up the performance of learned detector because otherwise “unlabeled” data newly gathered from the field could simply be utilized during training without the need of expensive human annotation. To be specific, we first explore the scenarios treating every unlabeled instance as a normal one as the proportion of anomalous samples in the unlabeled set varies. Also, we introduce an iterative training procedure for “negative-unlabeled” learning, in which the unlabeled data are incrementally labeled based on predictions to train an one-class classifier with samples regarded to potentially be normal. Our experiments use CH-Rand—a state-of-the-art method for learning useful representations for anomaly detection from fruit images—on the Riseholme-2021 dataset, which includes a number of healthy and unhealthy strawberry images collected under realistic conditions. Specifically, the results show that using an unlabeled set as normal data can lead to the 8.7% performance improvement without any effort for labeling, though 4% in the set are of anomalous strawberry. In addition, our iterative training can benefit trained anomaly detectors by automatically filtering out unlabeled anomalies to reduce the overall anomaly ratio in the unlabeled data from 6% to 4.3% consequently leading to better detection performance.