Polaris: Scaling Up Instruction-Guided Image Generation Towards Millions of Personalized Needs

ICLR 2026 Conference Submission16876 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: instruction-driven image generation; adapter retrieval; efficient fine-tuning alternatives
Abstract: Users increasingly expect image generation models to quickly adapt to highly diverse and personalized requirements, such as producing images with distinctive styles or characteristics. Traditional approaches rely on fine-tuning, which is costly and difficult to scale. To cope with these limitations, the community has accumulated a growing library of fine-tuned modules and adapters, where each component targets specific generation needs and collectively serves as a foundation for handling new demands. This naturally raises a question: *instead of repeatedly training new models, can we systematically exploit this expanding ecosystem to better fulfill user instructions*? To this end, we present Polaris, an intelligent retrieval framework that automatically selects and integrates suitable models from the model zoo based on a user's instructions. The key insight is that harnessing such a massive and heterogeneous pool requires not only finding the most relevant modules among thousands of candidates, but also aligning them effectively for instruction-driven generation and editing. Polaris addresses this challenge by indexing over 6,500 checkpoints and 75,000 adapters, and retrieving the most relevant components given a user's input and instruction. In doing so, it delivers scalable, controllable, and well-aligned generation---without any additional training.
Primary Area: generative models
Submission Number: 16876
Loading