Towards Visual Re-Identification of Fish using Fine-Grained Classification for Electronic Monitoring in Fisheries
Keywords: Electronic Monitoring, Re-identification, fine-grained classification, Fisheries, Metric learning, Deep Learning
TL;DR: This study demonstrates the superiority of the Swin Transformer architecture over traditional CNNs for fine-grained fish re-identification, achieving 90% rank-1 accuracy in a simulated Electronic Monitoring (EM) environment related to fisheries.
Abstract: Accurate fisheries data are crucial for effective and sustainable marine resource management. With the recent adoption of Electronic Monitoring (EM) systems, more video data is now being collected than can be feasibly reviewed manually. This paper addresses this challenge by developing an optimized deep learning pipeline for automated fish re-identification (Re-ID) using the novel AutoFish dataset, which simulates EM systems with conveyor belts with six similarly looking fish species. We demonstrate that key Re-ID metrics (R1 and mAP@k) are substantially improved by using hard triplet mining in conjunction with a custom image transformation pipeline that includes dataset-specific normalization. By employing these strategies, we demonstrate that the Vision Transformer-based Swin-T architecture consistently outperforms the Convolutional Neural Network-based ResNet-50, achieving a peak performance of 41.65\% mAP@k and 90.43\% Rank-1 accuracy. An in-depth analysis reveals that the primary challenge is not inter-species confusion but distinguishing visually similar individuals of the same species (intra-species errors), especially in challenging scenarios where fish are both occluded and presented from opposite body sides.
Serve As Reviewer: ~Ercan_Avsar1
Submission Number: 61
Loading