FakeXplain: AI-Generated Images Detection via Human-Aligned Grounded Reasoning

FakeXplain: AI-Generated Images Detection via Human-Aligned Grounded Reasoning

ICLR 2026 Conference Submission2139 Authors

04 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Vision Language Models, Image Forensics, AIGC Detection

TL;DR: This work fine-tunes a Vision Language Model based on human-annotated data to classify AI-generated images and pinpoint where and why it considers so.

Abstract: The rapid rise of image generation calls for detection methods that are both interpretable and reliable. Existing approaches, though accurate, act as black boxes and fail to generalize to out-of-distribution data, while multi-modal large language models (MLLMs) provide reasoning ability but often hallucinate. To address these issues, we construct FakeXplained dataset of AI-generated images annotated with bounding boxes and descriptive captions that highlight synthesis artifacts, forming the basis for human-aligned, visually grounded reasoning. Leveraging FakeXplained, we develop FakeXplainer which fine-tunes MLLMs with a progressive training pipeline, enabling accurate detection, artifact localization, and coherent textual explanations. Extensive experiments show that FakeXplainer not only sets a new state-of-the-art in detection and localization accuracy (98.2% accuracy, 36.0% IoU), but also demonstrates strong robustness and out-of-distribution generalization, uniquely delivering spatially grounded, human-aligned rationales.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 2139

Loading