Abstract: Highlights•We present a novel perspective that a visual primitive can be regarded as a word.•We propose VisPrompt for interacting visual primitives with sub-concepts in a prompt.•VisPrompt can easily align visual elements with text and interact vision with text.•We propose multi-level visual prompts to achieve fine-grained alignment.
Loading