IPAdapter-Instruct: Resolving Ambiguity in Image-Based Conditioning Using Instruct Prompts

Ciara Rowles, Shimon Vainer, Dante De Nigris, Slava Elizarov, Konstantin Kutsy, Simon Donné

Published: 2024, Last Modified: 01 Mar 2026ECCV Workshops (9) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Diffusion models continuously push the boundary of state-of-the-art image generation, but the process is hard to control with any nuance: practice proves that textual prompts are inadequate for accurately describing image style or fine structural details (such as faces). ControlNet [43] and IPAdapter [39] address this shortcoming by conditioning the generative process on imagery instead, but each individual instance is limited to modeling a single conditional posterior: for practical use-cases, where multiple different posteriors are desired within the same workflow, training and using multiple adapters is cumbersome. We propose IPAdapter-Instruct, which combines natural-image conditioning with “Instruct” prompts to swap between interpretations for the same conditioning image: style transfer, object extraction, both, or something else still? IPAdapterInstruct efficiently learns multiple tasks with minimal loss in quality compared to dedicated per-task models.

External IDs:dblp:conf/eccv/RowlesVNEKD24