Diffusion Models are Few-shot Learners for Dense Vision Tasks

Benlin Liu; Luming Tang; Yongming Rao; Wei-Chiu Ma; Ranjay Krishna

Diffusion Models are Few-shot Learners for Dense Vision Tasks

Benlin Liu, Luming Tang, Yongming Rao, Wei-Chiu Ma, Ranjay Krishna

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Few-shot Learning; Dense Prediction; Generative Model

Abstract: The ability to adapt to new, unseen tasks with only a handful of training examples is a key factor behind the unprecedented success of language models. However, in computer vision, few-shot adaption has largely focused on adapting to new semantic categories or answering new visual questions. Adapting a model to dense vision tasks – depth estimation, surface normal estimation, semantic segmentation – has only been possible with large amounts of training data and with custom decoder heads, since the output spaces for each task varies widely. For instance, depth estimation outputs continuous values while semantic segmentation generates discrete categorical assignments. In this paper, we found that the diffusion prior can effectively adapt to various dense tasks, and based on this, we introduce an adaptation mechanism that exploits a pretrained diffusion model for 12 different dense vision tasks using only a few training examples. Moreover, adapting to different tasks requires only modifying the input, without changing the internal parameters of the model. Our key insight is to reframe all dense prediction tasks into a codebook-conditioned classification problem, even for continuous outputs. Specifically, we learn two set of parameters: (1) concept embeddings that condition the diffusion model to encode task-specific representations in their attention masks; and (2) codebook embeddings that recombine discrete outputs to continuous ones. With this novel design, we achieve state-of-the-art results across 12 datasets for few shot learning.

Supplementary Material: pdf

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 12649

Loading