Keywords: architecture search, parameter-efficient tuning, image generation
Abstract: The large-scale text-to-image diffusion model, represented by Stable Diffusion, has achieved remarkable success in the field of image generation. Transferring pretrained diffusion models to downstream domains with parameter-efficient tuning (PEFT) methods such as Adapter and LoRa have become the most common paradigms. Despite their widespread usage, there has been limited research on systematically studying how the design of these components would impact the final tuning effectiveness.
In this paper, we investigate the automatic design of an optimal tuning architecture. Specifically, we employ a reinforcement learning-based neural network search method to facilitate the automatic design of the tuning architecture for PEFT of Stable Diffusion with few-shot training data. Our search space includes micro-structures similar to Adapter, LoRa, as well as their insertion positions.
For effective searching and evaluation, we build a large-scale tuning dataset. Through our search, we successfully obtained a novel tuning architecture that reduces parameter count by 18\% compared to the widely adopted LoRa approach but still surpasses across various downstream tasks hugely. We also conduct extensive analysis of the searched results, aiming to provide valuable insights to the community regarding parameter-efficient tuning for large-scale diffusion models.
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3798
Loading