Keywords: image editing, diffusion models, visual prompting
TL;DR: We propose a framework for inverting visual prompts into editing instructions for text-to-image diffusion models.
Abstract: Text-conditioned image editing has emerged as a powerful tool for editing images.
However, in many situations, language can be ambiguous and ineffective in describing specific image edits.
When faced with such challenges, visual prompts can be a more informative and intuitive way to convey ideas.
We present a method for image editing via visual prompting.
Given pairs of example that represent the "before" and "after" images of an edit, our goal is to learn a text-based editing direction that can be used to perform the same edit on new images.
We leverage the rich, pretrained editing capabilities of text-to-image diffusion models by inverting visual prompts into editing instructions.
Our results show that with just one example pair, we can achieve competitive results compared to state-of-the-art text-conditioned image editing frameworks.
Supplementary Material: zip
Submission Number: 5901
Loading