DeepSeaVision: Enhanced Detection and Classification of Underwater Species

G. Joshita Reddy, Keerthika Reddy, Sai Ruthvik Athota, Swathi Jamjala Narayanan, Boominathan Perumal, Gaurav Kumar Nayak

Published: 01 Jan 2025, Last Modified: 06 Nov 2025IEEE AccessEveryoneRevisionsCC BY-SA 4.0

Abstract: In the face of evolving marine ecosystems challenged by global warming, pollution, and overexploitation, the imperative for sustainable management of marine biodiversity is more pressing than ever. Manual monitoring of oceanic species, though vital, proves arduous, time-consuming, and ecologically sensitive. To address this, artificial intelligence has emerged as a key player in species detection and classification. However, the complex task of detecting species in obscured underwater images poses a significant challenge, undermining the reliability of existing computer vision algorithms. This research paper proposes a framework for underwater fish species detection, addressing the limitations of current datasets and models. We present a pipeline integrating underwater image enhancement techniques consisting of Gamma Correction, CLAHE, RGHS and UCM, followed by YOLO-based object detection to improve the identification of underwater entities. The proposed framework demonstrates enhanced accuracy in detection and classification. A web scraped dataset is collected meticulously, featuring 20 marine species from the Indian Ocean, which is an improvement in comparison to existing datasets with poor image quality and inadequate annotations. The proposed framework employed not only predicts known species but also detects unknown species. This contribution strives to advance the field of marine biodiversity monitoring, offering a robust and efficient framework for automating fish species detection in challenging underwater environments. For our dataset, we have concluded that YOLOv9 produces the most accurate results among all YOLO models due to its integration of advanced techniques such as the Generalized Efficient Layer Aggregation Network (GELAN) and Programmable Gradient Information (PGI). GELAN plays a critical role in enhancing YOLOv9’s learning capacity by efficiently aggregating information across layers, minimizing redundancy, and retaining critical features necessary for accurate object detection. Additionally, PGI optimizes the gradient flow during training, allowing YOLOv9 to better preserve and utilize essential information, even in deeper layers. Although, models like YOLOv12 offer lower inference time, YOLOv9 trained on raw dataset still outperforms YOLOv12 trained on enhanced dataset in overall detection accuracy. By addressing the challenge of information loss inherent in deep neural networks through these innovative mechanisms, YOLOv9 achieves exceptional precision and adaptability.

External IDs:doi:10.1109/access.2025.3618106