Abstract: AI-generated image detectors have historically concentrated on generalization across generative models, often overlooking the critical challenge of cross-semantic generalizability. This limitation constrains the adaptability of detectors to new semantic content in real-world settings. We propose Adaptive Test-Time Semantic Debiasing (ATTSD), a zero-shot approach that utilizes the visual-semantic space
of large pretrained vision-language models to dynamically align feature representations during testing—without requiring additional training data or annotations. To further enhance adaptability, we introduce Semantic-Suppression for hard sample mining, adjusting the degree of semantic debiasing for each sample based on Fourier transform properties. To assess cross-semantic generalizability, we present the Cross-Semantic AI-generated Image Detection dataset (CSAIID), a benchmark comprising diverse semantic categories reflective of real-world complexities. Extensive experiments show that ATTSD achieves state-of-the-art performance, particularly excelling in cross-semantic scenarios, positioning it as a promising solution for detecting evolving AI-generated content.
Loading