ViT-UWA: Vision Transformer Underwater-Adapter for Dense Predictions Beneath the Water Surface

Qirui LIN; Hua Li; Yuheng Jia; Yutong Li; Shijie Lian; Huazhong Liu; Sam Kwong; Runmin Cong

ViT-UWA: Vision Transformer Underwater-Adapter for Dense Predictions Beneath the Water Surface

Qirui LIN, Hua Li, Yuheng Jia, Yutong Li, Shijie Lian, Huazhong Liu, Sam Kwong, Runmin Cong

23 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Underwater Image Dense Prediction, Adapted ViT Backbone

Abstract: Vision Transformer (ViT) and its variants have witnessed a significant success in computer vision. However, they do not perform well in underwater dense prediction tasks due to challenges like complex underwater environments, quality degradation, and light scattering in underwater images. To solve this problem, we propose the Vision Transformer Underwater-Adapter (ViT-UWA), the first detail-focused and adapted ViT backbone for underwater dense prediction tasks, without requiring task-specific pretraining. In ViT-UWA, we first introduce High-frequency Components Prior (HFCP) to add high-frequency information of underwater images to the plain ViT, which can help recover and capture lost high-frequency information of underwater images. Then, we propose an Detail Aware Module (DAM) to obtain a detail-focused multi-scale convolutional feature pyramid, which can be used in kinds of dense prediction tasks. Through the ViT-CNN Interaction Module (VCIM), we achieve bidirectional feature fusion between ViT and CNN. We evaluate ViT-UWA on multiple underwater dense prediction tasks, including semantic segmentation, instance segmentation, and object detection. Notably, with only ImageNet-22K pretraining, our ViT-UWA-B yields state-of-the-art 46.4 box AP and 44.2 mask AP on USIS10K dataset. We hope ViT-UWA could provide a new backbone for future research on underwater dense prediction tasks.

Primary Area: applications to computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2807

Loading