Abstract: Automatically discovering failures in vision models under real-world settings remains an open challenge. This work describes how off-the-shelf, large-scale, image-to-text and text-to-image models, trained on vast amounts of data, can be leveraged to automatically find such failures. We detail a pipeline that demonstrates how we can interrogate classifiers trained on ImageNet to find specific failure cases and discover spurious correlations. We also show that we can scale our approach to generate adversarial datasets targeting specific classifier architectures. This work serves as a proof-of-concept demonstrating the utility of large-scale generative models to automatically discover bugs in vision models in an open-ended manner.