Mass-Producing Failures of Multimodal Systems with Language Models

Shengbang Tong; Erik Jones; Jacob Steinhardt

Mass-Producing Failures of Multimodal Systems with Language Models

Shengbang Tong, Erik Jones, Jacob Steinhardt

Published: 21 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX

Keywords: safety, red-teaming, robustness, explainability, failures, multimodal models, vision-language, natural-language explanations

TL;DR: Our system, MultiMon, exploits erroneous agreement to autonomously uncover failures in text-guided multimodal models

Abstract: Deployed multimodal models can fail in ways that evaluators did not anticipate. In order to find these failures before deployment, we introduce MultiMon, a system that automatically identifies systematic failures---generalizable, natural-language descriptions that describe categories of individual failures. To uncover systematic failures, MultiMon scrapes for examples of erroneous agreement: inputs that produce the same output, but should not. It then prompts a language model to identify common categories and describe them in natural language. We use MultiMon to find 14 systematic failures (e.g."ignores quantifiers'') of the CLIP text-encoder, each comprising hundreds of distinct inputs (e.g."a shelf with a few/many books''). Because CLIP is the backbone for most state-of-the-art multimodal models, these inputs produce failures in Midjourney 5.1, DALL-E, VideoFusion, and others. MultiMon can also steer towards failures relevant to specific use cases, such as self-driving cars. We see MultiMon as a step towards evaluation that autonomously explores the long-tail of potential system failures.

Supplementary Material: zip

Submission Number: 9715

Loading