Enhancement of Underwater Images Using Lattice Transformer at Multiple Scales and Layers

Wei Yen Hsu, I. Ting Chen, Yu Ming Hsieh

Published: 01 Jan 2025, Last Modified: 01 Mar 2026IEEE Journal of Oceanic EngineeringEveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Images captured underwater frequently suffer from color distortion and detail loss due to the light being absorbed and scattered. The task of improving these images is made more difficult by the variations in attenuation of wavelength and distance, along with color shifts that occur across various scales and layers. To address these issues, we propose a novel multiscale and multilayer lattice transformer (MMLattFormer) to effectively remove color distortions and artifacts, avoid excessive enhancement, and maintain detail across different scales and layers, thereby delivering more accurate and natural results in the enhancement of underwater images. The MMLattFormer model leverages the strengths of the lattice transformer (LattFormer) architecture to enhance global perception, while its multiscale and multilayer configuration capitalizes on the differences and complementarities between features of various scales to improve local perception. The MMLattFormer model consists of multiple LattFormers arranged in a multiscale and multilayer fashion. Each LattFormer includes two main modules: the Multihead Transposed-attention Residual Network (MTRN) and the Gated-attention Residual Network (GRN). The MTRN module facilitates efficient cross-pixel interaction and pixel-level aggregation, extracting more significant and distinguishable features. In contrast, the GRN module effectively suppresses underinformed or redundant features, retaining only useful information. This enables excellent image restoration by exploiting both the local and global structures of the images. Qualitative and quantitative results demonstrate that the proposed method outperforms state-of-the-art approaches, delivering more natural results. This is demonstrated by its exceptional ability to preserve details, effectively prevent overenhancement, and successfully eliminate artifacts and color deviations across multiple public data sets.

External IDs:doi:10.1109/joe.2025.3583856