[Re] Smoothed Energy Guidance for Diffusion Models

TMLR Paper4319 Authors

22 Feb 2025 (modified: 22 May 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: This study is part of the MLRC Reproducibility Challenge 2025, aiming to reproduce and improve the results from a NeurIPS 2024 submission Smoothed Energy Guidance (SEG): Guiding Diffusion Models with Reduced Energy Curvature of Attention. The work proposed in the SEG paper faced key limitations, including the lack of an ablation study for optimal kernel size selection and unexplored alternative blurring strategies within diffusion models, which could offer valuable insights into enhancing image quality and model robustness. Furthermore, the approach employed unnecessary smoothing throughout all iterations of the denoising process, which not only diminished the clarity of the output but also resulted in increased computational costs. To address these issues, we conducted a detailed ablation study and explored more efficient alternatives, including Exponential Moving Average (EMA) and BoxBlur using integral images, to improve computational efficiency while maintaining image quality. Our findings provide insights into optimizing smooth energy guidance in diffusion models, reducing computational overhead while improving image quality.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: In response to the feedback received on the previous submission, we have made several significant improvements in this revised version of the manuscript. An extensive literature review has been included to better contextualize our research within existing works and to support the methodology adopted in this study. To quantitatively evaluate the effectiveness of different blurring techniques in smoothing the energy curvature, we have added Fréchet Inception Distance (FID) scores across various settings of these techniques. Additionally, we have addressed the concern regarding statistical robustness by including more random seeds in our experiments for a more comprehensive comparison. This version also provides a clearer explanation of how the proposed smoothening techniques (like EMA and Box Blur) align with the Smoothed Energy Guidance framework introduced in this work. To enhance readability and traceability, we have improved the referencing of figures and equations throughout the manuscript, addressing earlier concerns about difficulty in following experimental results and their corresponding explanations. Finally, we have fixed bugs in the generated image comparisons—particularly the issue with the last column in the "Blur Time = All" setting—based on the reviewers' observations.
Assigned Action Editor: ~Qing_Qu2
Submission Number: 4319
Loading