E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation

Yifan Gong; Zheng Zhan; Qing Jin; Yanyu Li; Yerlan Idelbayev; Xian Liu; Andrey Zharkov; Kfir Aberman; Sergey Tulyakov; Yanzhi Wang; Jian Ren

E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation

Yifan Gong, Zheng Zhan, Qing Jin, Yanyu Li, Yerlan Idelbayev, Xian Liu, Andrey Zharkov, Kfir Aberman, Sergey Tulyakov, Yanzhi Wang, Jian Ren

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: generative adversarial network, diffusion model, efficient training

Abstract: One highly promising direction for enabling flexible *real-time on-device* image editing is utilizing data distillation by leveraging large-scale text-to-image diffusion models, such as Stable Diffusion, to generate paired datasets used for training generative adversarial networks (GANs). This approach notably alleviates the stringent requirements typically imposed by high-end commercial GPUs for performing image editing with diffusion models. However, unlike text-to-image diffusion models, each distilled GAN is specialized for a specific image editing task, necessitating costly training efforts to obtain models for various concepts. In this work, we introduce and address a novel research direction: *can the process of distilling GANs from diffusion models be made significantly more efficient?* To achieve this goal, we propose a series of innovative techniques. First, we develop an attention-based network architecture tailored for efficient image-to-image translation on mobile devices, which yields faster inference speeds, reduces the number of parameters, and lowers computational costs compared to existing image-to-image models. Second, we introduce a hybrid training pipeline that efficiently adapts a pre-trained text-conditioned GAN model to different concepts while substantially reducing computational costs. Moreover, this approach significantly minimizes the storage requirements for each concept. Third, we investigate the minimal amount of data necessary to train each GAN, further reducing the overall training time. Extensive experiments demonstrate that we can efficiently empower GANs with the ability to perform real-time high-quality image editing on mobile devices with remarkable reduced training cost and storage for each concept.

Supplementary Material: zip

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6148

Loading