Discovering the Hidden Vocabulary of DALLE-2Download PDF

Published: 29 Nov 2022, Last Modified: 05 May 2023SBM 2022 PosterReaders: Everyone
Keywords: text-to-image, generative models, dalle-2, adversarial examples
TL;DR: We discover that DALLE-2 seems to have a hidden vocabulary that can be used to generate images with absurd prompts.
Abstract: We discover that DALLE-2 seems to have a hidden vocabulary that can be used to generate images with absurd prompts. For example, it seems that ``Apoploe vesrreaitais'' means birds and ``Contarra ccetnxniams luryca tanniounons'' (sometimes) means bugs or pests. We find that these prompts are often consistent in isolation but also sometimes in combinations. We present our black-box method to discover words that seem random but have some correspondence to visual concepts. This creates important security and interpretability challenges.
Student Paper: Yes
1 Reply

Loading