Abstract: There are many similar appearance in fish species, and identifying species from an image is a challenging task called fine-grained image recognition (FGIR). Although several model architectures focusing on local regions have been proposed for FGIR and validated for their effectiveness, it is still important to expand the training dataset to enhance model performance and prevent overfitting. In this study, we develop a method to improve the estimation accuracy of a fish-species FGIR task by data augmentation using a foundation model Grounding DINO. Our method crops fish regions by Grounding DINO as data augmentation, and use them together with the original dataset for FGIR model training. As a result of our experiments using WildFish dataset, we demonstrated the effectiveness of our data augmentation method.
Loading