Keywords: textual inversion, diffusion, personalized generation, text-to-image
TL;DR: We extend Textual Inversion to learn pseudo-words that represent a concept at different resolutions.
Abstract: We extend Textual Inversion to learn pseudo-words that represent a concept at different resolutions. This allows us to generate images that use the concept at different resolutions and also to manipulate different resolutions using language. Once learned, the user can generate images that agree with the original concept at different levels of detail; ``A photo of $S^*(0)$'' produces the exact object while the prompt ``A photo of $S^*(0.8)$'' only matches the rough outlines and colors. Our framework allows us to generate images that use different resolutions of an image (e.g. details, textures, styles) as separate pseudo-words that can be composed in various ways.
Student Paper: Yes
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2211.17115/code)