UniForge: A Unified Multimodal Large Model for Detecting All-Domain Forged Image

Wenbo Feng; Jiwei Zhang; Siwei Wang; Feifei Kou; Pengfei Zhang; ShaoZhang Niu

UniForge: A Unified Multimodal Large Model for Detecting All-Domain Forged Image

Wenbo Feng, Jiwei Zhang, Siwei Wang, Feifei Kou, Pengfei Zhang, ShaoZhang Niu

17 Sept 2025 (modified: 15 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal Large Model, Universal Forged Image Detection

TL;DR: This paper proposes a multimodal large model for universal forged image detection.

Abstract: With the rapid development and increasing diversification of image forged techniques, existing detection methods have exposed significant limitations in addressing emerging challenges. Current forged techniques encompass traditional methods like image manipulation, text manipulation, as well as emerging ones such as deepfake and artificial intelligence generated content. However, most existing detection models are designed to detect or localize only a single type of forgery, lacking a universal solution that can handle multiple forged methods. To address this challenge, this paper proposes a unified multimodal large model for detecting all-domain forged images named UniForge. This model aims to provide a general forged detection method to discriminate the authenticity of various types of images effectively. At its core is a novel Vision-Fusion Large Language Model, which skillfully combines the powerful feature extraction capabilities of pre-trained vision models with the outstanding semantic understanding and reasoning abilities of large language models. We have conducted extensive experimental evaluations on datasets covering various forged types, including image manipulation, text manipulation, artificial intelligence generated image, and deepfake. The results demonstrate that UniForge achieves state-of-the-art performance in the detection of all forged categories. Its comprehensive performance significantly surpasses existing methods, validating the advanced nature and excellent generalization capability of our framework.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 8357

Loading