In-Context Alignment: Resolving Representation Conflict for Parameter-Efficient Forgery Detection in Vision Models
Keywords: representation conflict, prompt learning, visual forensics, parameter-efficient learning
Abstract: Vision Foundation Models (VFMs) have shown remarkable potential in image forensics, yet their content-driven representations often suppress the subtle forensic cues essential for manipulation localization, thereby exhibiting an inherent representation conflict. Conventional full fine-tuning struggles to address this conflict, as it demands extensive parameter updates, risks overfitting, and erodes prior knowledge, leading to poor generalization in diverse forgery detection scenarios. We propose In-Context Alignment (ICA), a parameter-efficient framework that reframes forgery localization as a visual in-context learning task. ICA introduces two complementary prompting mechanisms within frozen VFMs: a Physical-Aware Prompter (PAP) that enhances suppressed low-level forensic signals such as noise and frequency artifacts via a Mixture-of-Experts for adaptive fusion, and a Semantic-Aware Prompter (SAP) that encourages the model to expose semantic inconsistencies in high-level features.
With only a small fraction of parameters updated, ICA achieves strong performance across diverse image forgery localization benchmarks and can even compete with fully fine-tuned models. Our results demonstrate that in-context alignment of semantic and forensic representations offers a scalable, robust, and efficient paradigm for advancing visual forensics.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 11007
Loading