Trinity Detector: Text-Assisted and Attention Mechanisms Based Spectral Fusion for Diffusion Generation Image Detection
Abstract: Artificial Intelligence Generated Content (AIGC) techniques, represented by text-to-image generation, have led to a malicious use of deep forgeries, raising concerns about the trustworthiness of multimedia content. Experimental results demonstrate that traditional forgery detection methods perform poorly in adapting to diffusion model-generated scenarios, while existing diffusion-specific techniques lack robustness against post-processed images. In response, we propose the Trinity Detector, which integrates coarse-grained text features from a Contrastive Language-Image Pretraining (CLIP) encoder with fine-grained artifacts in the pixel domain to achieve semantic-level image detection, significantly enhancing model robustness. To enhance sensitivity to diffusion-generated image features, a Multi-spectral Channel Attention Fusion Unit (MCAF) is designed. It adaptively fuses multiple preset frequency bands, dynamically adjusting the weight of each band, and then integrates the fused frequency-domain information with the spatial co-occurrence of the two modalities. Extensive experiments validate that our Trinity Detector improves transfer detection performance across black-box datasets by an average of 14.3% compared to previous diffusion detection models and demonstrating superior performance on post-processed image datasets.
Loading