Keywords: visual attributes, diffusion, attribute subspace
TL;DR: We propose Attribute Subspace Inference Tasks and develop a training setup that enables generative models to infer shared semantic attributes from just a few example images without labels, and to generate attribute-consistent images.
Abstract: Recent advances in generative vision-language models have demonstrated remarkable capabilities in image synthesis, captioning, and multi-modal reasoning. Among their most intriguing behaviors is in-context learning, the ability to adapt to new tasks from just a few examples. While well-studied in language models, this capability remains underexplored in the visual domain. Motivated by this, we explore how generative vision models can infer and apply visual concepts directly from image sets, without relying on text or labels. We frame this as an attribute subspace inference task: given a small set of related images, the model identifies the shared variation and uses it to guide generation from a query image. During training, we use auxiliary groupings to provide weak structural supervision. At inference time, the model receives only unlabeled inputs and must generalize the visual concept based on example images alone. Our approach enables attribute-consistent image generation and contributes a novel direction for nonverbal concept learning in vision.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 17019
Loading