TRIDENT: Tri-modal Deepfake Perception, Detection, and Hallucination Grand Challenge

Wen-Huang Cheng; Hong-Han Shuai; Khoa D Doan; Hongxia Xie; Ling Lo; Jian-Yu Jiang-Lin; Kang-Yang Huang; Ling Zou

TRIDENT: Tri-modal Deepfake Perception, Detection, and Hallucination Grand Challenge

Wen-Huang Cheng, Hong-Han Shuai, Khoa D Doan, Hongxia Xie, Ling Lo, Jian-Yu Jiang-Lin, Kang-Yang Huang, Ling Zou

Published: 03 Apr 2026, Last Modified: 03 Apr 2026ACMMM2026-MGC-ProposalEveryoneRevisionsCC BY 4.0

Keywords: Multimodal Deepfake Forensics, Interpretable Deepfake Detection, Explainable Forensic Reasoning, Detection Hallucination, Deepfake Evidence Grounding, Standardized Forensic Probing

TL;DR: TRIDENT challenges participants to eliminate Deepfake detection hallucinations by ensuring that every decision is logically grounded in accurately perceived generative artifacts.

Abstract: The rapid evolution of generative AI has ushered in an era of hyper-realistic, tri-modal forgeries spanning images, video, and audio. While detection performance has reached high numerical accuracy, modern forensic systems remain "black boxes," often achieving results through stochastic shortcuts or dataset biases rather than grounded reasoning. The lack of interpretability leads to the Hallucination Dilemma, where models justify correct classifications with non-existent artifacts, resulting in a critical failure mode in high-stakes legal and journalistic environments. We propose TRIDENT, Tri-modal Deepfake Perception, Detection, and Hallucination Grand Challenge. TRIDENT is a novel competition designed to shift the community toward accountable and explainable forensics. Built upon the large-scale TriDF benchmark (https://j1anglin.github.io/TriDF/), the challenge requires participants to move beyond binary classification. Models are evaluated across three interdependent dimensions: Perception (the ability to localize fine-grained artifacts), Detection (decision robustness across 16 forgery families), and Hallucination (the logical consistency between perceived evidence and final labels). By providing a standardized probing protocol involving Structured VQA and Open-Ended Reasoning tasks, TRIDENT establishes a rigorous framework for evaluating the next generation of accountable and explainable Deepfake detectors. The proposed challenge invites the multimedia research community to bridge the gap between human-centric evidence and AI-driven forensics, ensuring that the future of digital media authentication is as interpretable as it is accurate.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 23

Loading