Abstract: Under low light conditions, the quality of video data is heavily affected by noise, artifacts, and weak contrast, leading to low signal-to-noise ratio. Therefore, enhancing low light video to obtain high-quality information expression is a challenging problem. Deep learning based methods have achieved good performance on low light enhancement tasks and a majority of them are based on Unet. However, the widely used Unet architecture may generate pseudo-detail textures, as the simple skip connections of Unet introduce feature inconsistency between encoding and decoding stages. To overcome these shortcomings, we propose a novel network 3D Swin Skip Unet (3DS $$^2$$ Unet) in this paper. Specifically, we design a novel feature extraction and reconstruction module based on Swin Transformer and a temporal-channel attention module. Temporal-spatial complementary feature is generated by two modules and then fed into the decoder. The experimental results show that our model can well restore the texture of objects in the video, and performs better in removing noise and maintaining object boundaries between frames under low light conditions.
0 Replies
Loading