AI-MedLeafX: a large-scale computer vision dataset for medicinal plant diagnosis

Md. Fahim Ferdous, Faysal Bin Khaled Nissan, Nur Muhammad Nibir, Md. Hasan Imam Bijoy

Published: 01 Oct 2025, Last Modified: 11 Nov 2025Data in BriefEveryoneRevisionsCC BY-SA 4.0
Abstract: This study presents a large, meticulously curated and manually validated dataset aimed at classifying leaf quality into five critical categories: Healthy, Bacterial Spot, Shot Hole, Yellow, and Powdery Mildew. The dataset encompasses four distinct plant species—Cinnamomum Camphora (Camphor), Terminalia Chebula (Haritaki), Moringa Oleifera (Sojina), and Azadirachta Indica (Neem)—each represented across three or four disease categories, depending on observed symptoms and final number of classes is thirteen (13 classes). Data collection was conducted between November 1, 2024, and January 5, 2025, utilizing four different mobile cameras to ensure diversity in image resolution, lighting, and environmental conditions. The original dataset comprised 10,858 high-resolution images, which were subsequently expanded to 65,148 through the application of six comprehensive data augmentation techniques, including rotations (45°, 60°, and 90°), horizontal flipping, zooming and brightness adjustment. All images were standardized to 512×512 pixels to ensure uniformity and seamless compatibility with machine learning and computer vision models. This enriched dataset serves as a crucial resource for the development of automated plant disease detection systems and supports advancements in precision agriculture. It not only addresses the pressing need for scalable, high-quality data in agricultural research but also establishes a solid foundation for benchmarking novel deep learning architectures. By enabling more accurate and efficient leaf disease classification, the dataset contributes significantly to enhancing tree health monitoring, improving crop yield, and promoting sustainable agricultural practices.
Loading