Prefilled responses enhance zero-shot detection of AI-generated images

Prefilled responses enhance zero-shot detection of AI-generated images

ACL ARR 2026 January Submission8113 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Vision-Language Models, AI-Generated Image Detection, Zero-Shot Detection, Prefill-Guided Thinking, Deepfakes, Synthetic Media, Prompt Engineering, Model Interpretability, Robustness, Generalization, Multimodal Forensics

Abstract: Traditional supervised methods for detecting AI-generated images depend on large, curated datasets for training and fail to generalize to novel, out-of-domain image generators. As an alternative, we explore pre-trained Vision-Language Models (VLMs) for zero-shot detection of AI-generated images. We evaluate VLM performance on three diverse benchmarks encompassing synthetic images of human faces, objects, and animals produced by 16 different state-of-the-art image generators. While off-the-shelf VLMs perform poorly on these datasets, we find that their reasoning can be guided effectively through a simple prefilling of responses — a method we call Prefill-Guided Thinking (PGT). In particular, prefilling a VLM response with the phrase "Let's examine the style and the synthesis artifacts" improves the Macro F1 scores of three widely used open-source VLMs by up to 24%. We analyze this improvement by tracking models' answer confidence at incremental intervals during response generation. For some models, prefills counteract early overconfidence — akin to mitigating the Dunning-Kruger effect — leading to better detection performance.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: multimodality, vision question answering, misinformation detection and analysis, safety and alignment, calibration/uncertainty, chain-of-thought, prompting, robustness, benchmarking, interpretability

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 8113

Loading