Empirical Performance of Deep Learning Models with Class Imbalance for Crop Disease Classification

Sèton Calmette Ariane Houetohossou; Castro Gbêmêmali Hounmenou; Vinasétan Ratheil Houndji; Romain Glèlè Kakaï

Empirical Performance of Deep Learning Models with Class Imbalance for Crop Disease Classification

Sèton Calmette Ariane Houetohossou, Castro Gbêmêmali Hounmenou, Vinasétan Ratheil Houndji, Romain Glèlè Kakaï

Published: 01 Jan 2024, Last Modified: 01 Apr 2025DeLTA (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Class imbalance refers to a situation where the number of observations in the different classes of a dataset is not equally distributed. This situation is most often encountered in agriculture for the classification of crop diseases. This can lead to challenges in training deep learning models, as they may become biased toward the majority class and perform poorly in predicting the minority class. One common approach to address class imbalance is resampling techniques, such as oversampling the minority class or undersampling the majority class. This study examined the performances of deep learning architectures (GoogleNet, VGG16, and ResNet50) for disease classification of tomatoes, peppers, and peaches in contexts of class imbalance. Data has been collected online from different websites (PlantVillage and PlantDisease). Each model was run in transfer learning and evaluated in three situations: without balancing, with Random Over Sampling (ROS) and with Random Under Sampling (RUS). The batch size and the number of epochs were set at 32 and 10, respectively. Recall, F1 score, Area Under the Receiver Operating Characteristic Curve, and the computing time were recorded. Results indicated that RUS significantly improves the precision, recall, and F1 score for GoogleNet despite a longer processing time than ROS. For VGG16, ROS proves superior in terms of learning time and performance. ROS and RUS enable Resnet50 to maintain high performance in the face of increasing class imbalance. Moreover, GoogleNet demonstrated more excellent results stability than VGG16 and ResNet50, especially under various levels of imbalance. This study highlights the importance of data balancing while acknowledging certain limitations, such as the size of the datasets and the model parameters used, paving the way for future research to optimize these methods.

Loading