Text Modality Oriented Image Feature Extraction for Detecting Diffusion-based DeepFake

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: DeepFake Detection, Diffusion, Feature Extraction
TL;DR: TOFE is a text-modality feature extraction method for effectively detecting diffusion-generated DeepFakes.
Abstract: The widespread use of diffusion methods enables the creation of highly realistic images on demand, thereby posing significant risks to the integrity and safety of online information and highlighting the necessity of DeepFake detection. Our analysis of features extracted by traditional image encoders across ten diffusion types reveals that both low-level and high-level features offer distinct advantages in identifying DeepFake images. Furthermore, the highly realistic images generated by diffusion models make it increasingly difficult to distinguish between real and fake within the image domain. Building on these insights, we propose the development of an effective representation beyond the image domain, capable of capturing both low-level and high-level features for detecting diffusion-based DeepFakes. Specifically, for a given target image, the representation we discovered is a corresponding text embedding that can guide the generation of the target image with a specific text-to-image model. Experiments conducted across ten diffusion types compared with five representative deepfake detection baselines demonstrate the efficacy of our proposed method.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 12427
Loading