Keywords: Image Forgery Localization, User-query Driven
Abstract: The rapid advancement of image editing technologies has amplified the urgency of developing reliable Image Forgery Localization (IFL) methods. Recent approaches based on Multimodal Large Language Models (MLLMs) have shown promise but suffer from $\textbf{weak visual-text alignment}$: they fail to regulate visual attention to the specific regions mentioned in user queries, leading to irrelevant responses. We argue that this limitation originates from a $\textbf{global outcome driven}$ paradigm that directs interpretability toward forgery localization results and focuses visual attention on the entire image. To address this issue, we propose a paradigm shift: interpretability in IFL ought to be $\textbf{regional user-query driven}$. Building on this principle and supported by a dataset containing queries related to the authenticity of specific regions, we present the $\textbf{U}$ser-query $\textbf{D}$riven $\textbf{I}$mage $\textbf{S}$hield (UDIS), a novel framework incorporating two key modules. The $\textbf{Query-Guided Module (QGM)}$ introduces a $\texttt{[QUERY]}$ token and a visual features filtering process based on the queries to strengthen the $\textbf{input-level}$ alignment (focusing on connecting query and MLLM’s visual attention). The $\textbf{Evidence-Aware Module (EAM)}$ introduces an $\texttt{[EVI]}$ token and an auxiliary authenticity evidence classification task to enhance alignment at the $\textbf{output-level}$ (focusing on associating explanatory text knowledge with forgery localization capability). By learning the two special tokens, MLLM’s alignment ability is enhanced, and the modal-consistency knowledge embedded in the tokens further supports the forgery localization process. Extensive experiments demonstrate that the proposed approach provides query-focused authenticity explanations, underscoring its superior practical value, and achieves state-of-the-art IFL performance.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 4620
Loading