Discovering the Hidden Vocabulary of DALLE-2Download PDF

Published: 29 Nov 2022, Last Modified: 05 May 2023SBM 2022 PosterReaders: Everyone
Keywords: text-to-image, generative models, dalle-2, adversarial examples
TL;DR: We discover that DALLE-2 seems to have a hidden vocabulary that can be used to generate images with absurd prompts.
Abstract: We discover that DALLE-2 seems to have a hidden vocabulary that can be used to generate images with absurd prompts. For example, it seems that ``Apoploe vesrreaitais'' means birds and ``Contarra ccetnxniams luryca tanniounons'' (sometimes) means bugs or pests. We find that these prompts are often consistent in isolation but also sometimes in combinations. We present our black-box method to discover words that seem random but have some correspondence to visual concepts. This creates important security and interpretability challenges.
Student Paper: Yes
1 Reply

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview