everyone
since 20 Jul 2024">EveryoneRevisionsBibTeXCC BY 4.0
In recent years, diffusion models have dominated the field of image generation with their outstanding generation quality. However, pre-trained large-scale diffusion models are generally trained using fixed-size images, and fail to maintain their performance at different aspect ratios. Existing methods for generating arbitrary-size images based on diffusion models face several issues, including the requirement for extensive finetuning or training, sluggish sampling speed, and noticeable edge artifacts. This paper presents the InstantAS method for arbitrary-size image generation. This method performs non-overlapping minimum coverage segmentation on the target image, minimizing the generation of redundant information and significantly improving sampling speed. To maintain the consistency of the generated image, we also proposed the Inter-Domain Distribution Bridging method to integrate the distribution of the entire image and suppress the separation of diffusion paths in different regions of the image. Furthermore, we propose the dynamic semantic guided cross-attention method, allowing for the control of different regions using different semantics. InstantAS can be applied to nearly any existing pre-trained Text-to-Image diffusion model. Experimental results show that InstantAS has better fusion capabilities compared to previous arbitrary-size image generation methods and is far ahead in sampling speed compared to them.