Abstract: Amidst the prevailing trend of escalating demands for data and computational resources, the efficiency of data utilization emerges as a critical lever for enhancing the performance of deep learning models, especially in the realm of image restoration tasks. This investigation delves into the intricacies of data efficiency in the context of image restoration, with Gaussian image denoising serving as a case study. We postulate a strong correlation between the model's performance and the content information encapsulated in the training images. This hypothesis is rigorously tested through experiments conducted on synthetically blurred datasets. Building on this premise, we delve into the data efficiency within training datasets and introduce an effective and stabilized method for quantifying content information, thereby enabling the ranking of training images based on their influence. Our in-depth analysis sheds light on the impact of various subset selection strategies, informed by this ranking, on model performance. Furthermore, we examine the transferability of these efficient subsets across disparate network architectures. The findings underscore the potential to achieve comparable, if not superior, performance with a fraction of the data—highlighting instances where training IRCNN and Restormer models with only 3.89% and 2.30% of the data resulted in a negligible drop and, in some cases, a slight improvement in PSNR. This investigation offers valuable insights and methodologies to address data efficiency challenges in Gaussian denoising. Similarly, our method yields comparable conclusions in other restoration tasks. We believe this will be beneficial for future research. Codes will be available at [URL].
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Systems] Data Systems Management and Indexing
Relevance To Conference: Image restoration is a cornerstone of low-level vision research and plays a crucial role in various multimedia applications. Annually, significant advancements in this field are showcased at the ACM MM conference. Despite the extensive exploration of image restoration techniques, a gap remains in understanding the impact of data on model learning. Specifically, there is a scarcity of research investigating data efficiency in image restoration, indicating a fertile ground for future inquiries. In this manuscript, our work, makes the first attempt that provides a comprehensive exploration of data efficiency for image restoration research using Gaussian image denoising as a case study. We investigate the effects of content information on model performance, point out that their variations exist inside training datasets and introduce an effective, stabilized method to estimate the influence of each training image. Our experimental outcomes underscore the potential to match or exceed the performance of comprehensively trained IRCNN and Restormer denoising models with merely 3.89% and 2.30% of the original dataset, respectively. Additionally, we perform an in-depth analysis to understand the model's learning behavior under diverse situations. Our methodological framework and insights mark a significant contribution to the field and could serve as valuable resources for future research.
Supplementary Material: zip
Submission Number: 1209
Loading