Abstract: BackgroundTraditional insect species classification relies on taxonomic experts examining unique physical characteristics of specimens, a time-consuming and error-prone process. Machine learning (ML) offers a promising alternative by identifying subtle morphological and genetic differences computationally. However, most existing approaches classify undescribed species as outliers, which limits their utility for biodiversity monitoring.ObjectiveThis study aims to develop an ML method capable of simultaneously classifying described species and grouping undescribed species by genus, thereby advancing the field of automated insect classification.MethodWe propose a novel ensemble approach combining neural networks (convolutional and attention-based) and Support Vector Machines (SVM), with both DNA barcoding and insect images as input data. To optimize the neural networks for diverse data types, we transform one-dimensional feature vectors into matrices using wavelet transforms. Additionally, a transformer-based architecture integrates DNA barcoding and image features for enhanced classification accuracy.Experimental ResultsOur method was evaluated on a comprehensive dataset containing paired insect images and DNA barcodes for 1,040 species across four insect orders. The results demonstrate superior performance compared to existing methods in classifying described species and grouping undescribed ones by genus.ConclusionThe proposed approach represents a significant advancement in automated insect classification, addressing both described and undescribed species. This method has the potential to revolutionize global biodiversity monitoring. The MATLAB/PyTorch source code and dataset used are available at https://github.com/LorisNanni/Insect-identification.
Loading