InfoScale: Unleashing Training-free Variable-scaled Image Generation via Effective Utilization of Information
Abstract: Diffusion models (DMs) have become dominant in visual generation but suffer a performance drop when tested on resolutions that differ from the training scale, whether lower or higher.
Current training-free methods for DMs have shown promising results, primarily focusing on higher-resolution generation. However, most methods lack a unified analytical perspective for variable-scale generation, leading to suboptimal results.
In fact, the key challenge in generating variable-scale images lies in the differing amounts of information across resolutions, which requires information conversion procedures to be varied for generating variable-scaled images.
In this paper, we investigate the issues of three critical aspects in DMs for a unified analysis in variable-scaled generation: dilated convolution, attention mechanisms, and initial noise.
Specifically, 1) dilated convolution in DMs for the higher-resolution generation loses high-frequency information.
2) Attention for variable-scaled image generation struggles to adjust the information aggregation adaptively.
3) The spatial distribution of information in the initial noise is misaligned with the variable-scaled image.
To solve the above problems, we propose $\textbf{InfoScale}$, an information-centric framework for variable-scaled image generation by effectively utilizing information from three aspects correspondingly.
For information loss in 1), we introduce a Progressive Frequency Compensation module to compensate for high-frequency information lost by dilated convolution in higher-resolution generation.
For information aggregation inflexibility in 2), we introduce an Adaptive Information Aggregation module to adaptively aggregate information in lower-resolution generation and achieve an effective balance between local and global information in higher-resolution generation.
For information distribution misalignment in 3), we design a Noise Adaptation module to re-distribute information in initial noise for variable-scaled generation.
Our method is plug-and-play, and extensive experiments demonstrate its effectiveness in variable-scaled image generation.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Lu_Jiang1
Submission Number: 6616
Loading