SIRLUT: Simulated Infrared Fusion Guided Image-adaptive 3D Lookup Tables for Lightweight Image Enhancement
Abstract: Researchers have applied 3D Lookup Tables (LUTs) in cameras, offering new possibilities for enhancing image quality and achieving various tonal effects. However, these approaches often overlook the non-uniformity of color distribution in the original images, which limits the performance of learnable LUTs. To address this issue, we introduce a lightweight end-to-end image enhancement method called Simulated Infrared Fusion Guided Image-adaptive 3D Lookup Tables (SIRLUT). SIRLUT enhances the adaptability of 3D LUTs by reorganizing the color distribution of images through the integration of simulated infrared imagery. Specifically, SIRLUT consists of an efficient Simulated Infrared Fusion (SIF) module and a Simulated Infrared Guided (SIG) refinement module. The SIF module leverages a cross-modal channel attention mechanism to perceive global information and generate dynamic 3D LUTs, while the SIG refinement module blends simulated infrared images to match image consistency features from both structural and color aspects, achieving local feature fusion. Experimental results demonstrate that SIRLUT outperforms state-of-the-art methods on different tasks by up to 0.88 $\sim$ 2.25dB while reducing the number of parameters. Code is available at \href{https://github.com/riversky2025/SIRLUT.git}{https://github.com/riversky2025/SIRLUT}.
Primary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: We propose a lightweight image enhancement method called Simulated Infrared Fusion-Guided Adaptive 3D Lookup Tables (SIRLUT), which can contribute to the field of multimedia/multi-modal processing. This method utilizes simulated infrared fusion guidance to enhance the adaptability of 3D lookup tables (LUTs) and improve their performance when processing images with non-uniform color distribution. SIRLUT consists of an efficient Simulated Infrared Fusion (SIF) LUT and a Simulated Infrared Guidance (SIG) module that combines structural and local cross-modal features. By doing so, we address the limitation of trainable LUTs when dealing with non-uniform color distribution in the original image. Our method achieves state-of-the-art performance on two public datasets and reduces the number of parameters, which is very impressive. Therefore, it is suitable for integration into hardware and software devices such as cameras, smartphones, and image-processing software. Overall, our work can promote the development of more efficient and effective multimedia/multi-modal processing methods to address real-world problems that require the combination of multiple media and modalities.
Supplementary Material: zip
Submission Number: 917
Loading