T-PhISH-Net: Temporally Consistent Underwater Image En- hancement via a Transformer-Based Extension of PhISH-Net

Bjørn Christian Weinbach

T-PhISH-Net: Temporally Consistent Underwater Image En- hancement via a Transformer-Based Extension of PhISH-Net

Bjørn Christian Weinbach

Published: 05 Nov 2025, Last Modified: 05 Nov 2025NLDL 2026 AbstractsEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Underwater image enhancement; Temporal consistency; Transformer; Physics-based model; Flicker reduction; Marine robotics

TL;DR: A physics-guided Transformer model that stabilizes underwater video enhancement by learning temporal coherence without using optical flow.

Abstract: Underwater video feeds often suffer from flicker—erratic frame-wise color and illumination shifts caused by wavelength-dependent light attenuation and scattering in water. This instability degrades the performance of vision systems for marine robotics and monitoring, which require consistent imagery over time. We address this with T-PhISH-Net, a novel underwater enhancement model that extends the single-image PhISH-Net framework into a temporally consistent video enhancer. T-PhISH-Net processes a causal window of frames and introduces three key innovations: (1) a motion-magnitude input channel feeding per-pixel movement cues to the network, (2) a causal Transformer encoder with a learnable decay gate to weigh past frames’ contributions, and (3) an adaptive loss weighting scheme that balances fidelity and coherence by learning spatial and temporal loss weights during training. Experiments on four underwater video datasets show that T-PhISH-Net produces state-of-the-art enhancement quality and significantly reduces flicker. The model runs in real time on sequential input, enabling flicker-free underwater video feeds for ROV/AUV operations, marine ecology surveys, and infrastructure inspection.

Serve As Reviewer: ~Bjørn_Christian_Weinbach1

Submission Number: 5

Loading