Prior-Guided Dual-Reference Contrastive Learning for Underwater Object Detection

Lei Cao, Liquan Shen, Meng Yu, Zhengyong Wang, Cheng Shen

Published: 2025, Last Modified: 28 Feb 2026IEEE Trans. Circuits Syst. Video Technol. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Underwater object detection (UOD) plays an important role in the exploitation of marine ecological resources. Different from terrestrial images, the complex underwater environment leads to significant degradation in underwater images, which brings great difficulty in accurate object detection. In recent years, many specially designed UOD methods have been proposed to improve the detection precision in two aspects based on underwater image characteristics: 1) Some UOD methods utilize underwater image enhancement (UIE) to alleviate degradation with the expectation of clean features. However, neither preprocessing nor cascade approaches are fully effective for detection-oriented enhancement, while the additional UIE network increases inference time. 2) Other UOD methods consider low visibility of objects, blurriness of small objects, and occlusion problems. However, the semantic complementarity between objects of the same category but different qualities and the background patterns of specific objects are ignored. Based on these two observations, we propose a novel framework for the UOD task, which performs feature enhancement in two ways. First, a group contrastive-based feature enhancement module (GCFEM) is proposed to bridge UIE and UOD. Specifically, multiple enhanced versions by UIEs are evaluated by the object detection precision evaluation pipeline. Then, group-based contrastive learning is introduced, which utilizes multiple groups of enhanced versions to guide the backbone in extracting detection-friendly features. Second, a prior-guided dual-reference feature enhancement module (PDFEM) is proposed to enhance the representation of objects further. Specifically, the explicit object-object relationship allows low-quality object regions to refer to high-quality ones, guided by a transmission map. At the same time, the implicit object-background relationship provides cues about the surroundings for the representation of the objects. Experimental results demonstrate that the proposed algorithm outperforms many state-of-the-art UOD methods on RUOD and URPC2020 datasets.
Loading