Abstract: Low-light image enhancement is a challenging yet beneficial task in computer vision that aims to improve the quality of images captured under poor illumination conditions. It involves addressing difficulties such as color distortions and noise, which often degrade the visual fidelity of low-light images. Although tremendous CNN-based and ViT-based approaches have been proposed, the potential of diffusion models in this domain remains unexplored. This paper presents L\(^2\)DM, a novel framework for low-light image enhancement using diffusion models. Since L\(^2\)DM falls into the category of latent diffusion models, it can reduce computational requirements through denoising and the diffusion process in latent space. Conditioning inputs are essential for guiding the enhancement process, therefore, a new ViT-based network called ViTCondNet is introduced to efficiently incorporate conditioning low-light inputs into the image generation pipeline. Extensive experiments on benchmark LOL datasets demonstrate L\(^2\)DM’s state-of-the-art performance compared to diffusion-based counterparts. The L\(^2\)DM source code is available on GitHub for reproducibility and further research.
Loading