Not All Pixels Sink: Phase-Guided Representation Learning for Underwater Image Restoration

ICLR 2026 Conference Submission25095 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Phase-Guided Representation Learning, Underwater Image Restoration, Phase based Attention, Color-Plausibility Quality Index (CPQI)
TL;DR: We propose NemoNet, an encoder-decoder with phase-guided learning for underwater image enhancement. A hybrid loss corrects color shifts, and we introduce CPQI metric to evaluate color consistency beyond conventional metrics.
Abstract: Underwater images suffer from color absorption, light scattering, and non-uniform haze, making reliable restoration crucial for marine science and autonomous navigation. We propose NemoNet, a novel encoder–decoder architecture that leverages phase-guided representation learning to overcome these challenges. The architecture incorporates Spectral–Spatial Attention (SSA) block that couples Fourier phase-based pixel refinement with spatial attention to recover fine textures. These details are most severely degraded in underwater conditions and are critical for perceptually convincing restoration more broadly. Phase-based attention in skip connections ensures that they enhance useful representations instead of propagating artifacts. We introduce a hybrid Un/Supervised loss framework, where comprehensive supervised objectives are complemented by an unsupervised color consistency loss that mitigates wavelength-dependent color shifts in underwater scenes. We further introduce a no-reference Color-Plausibility Quality Index (CPQI) that augments Perceptual Index with a color consistency prior, which conventional metrics fail to capture. Comprehensive experiments demonstrate that the proposed approach outperforms existing state-of-the-art methods on supervised (UIEB, LSUI, EUVP) and unsupervised (U45, SUIM) underwater image datasets across conventional and proposed metrics.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 25095
Loading