Bridging Modalities for Forgery Detection via Learnable Representations with Query-Guided Contrastive Learning

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: image manipulation localization, forged representation, bidirectional cross-attention, contrastive learning
Abstract: Image manipulation localization (IML) aims to identify tampered regions in edited images, which may range from object-level composites to subtle traces. Recent studies have began to explore the integration of multi-source cues, such as RGB, high frequency and noises, in pursuit of more precise localization. Despite this progress, the potential of cross-modal interactions and hierarchical perceptions deserves deeper investigation and exploitation. Inspired by how humans detect forgeries through dynamic zooming to capture holistic-local and semantic-detail cues, we propose BriQ (Bridge-Modality Query), a query-based framework that learns forged-aware representations to perceive multi-scale information. Meanwhile, we incorporate a structured attention to effectively model cross-modal interactions. To further enhance discriminative capability, we introduce query-to-regions contrastive learning (Q2R), which encourages representations to capture the essential contrast between tampered and authentic regions and aggregate forgery-related features, thereby significantly improving IML task performance. Extensive experiments conducted on multiple benchmark datasets validate BriQ's state-of-the-art effectiveness and robustness, while comprehensive ablation studies confirm the contributions of each component.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 8243
Loading