Visual primitives as words: Alignment and interaction for compositional zero-shot learning

Published: 01 Jan 2025, Last Modified: 11 Apr 2025Pattern Recognit. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•We present a novel perspective that a visual primitive can be regarded as a word.•We propose VisPrompt for interacting visual primitives with sub-concepts in a prompt.•VisPrompt can easily align visual elements with text and interact vision with text.•We propose multi-level visual prompts to achieve fine-grained alignment.
Loading