Multi-view Feature Extraction via Tunable Prompts is Enough for Image Manipulation Localization

Xuntao Liu, Yuzhou Yang, Haoyue Wang, Qichao Ying, Zhenxing Qian, Xinpeng Zhang, Sheng Li

Published: 2024, Last Modified: 05 Mar 2025ACM Multimedia 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Deceptive images can quickly spread via social networking services, posing significant risks. The rapid progress in Image Manipulation Localization (IML) seeks to address this issue. However, the scarcity of public training datasets in the IML task directly hampers the performance of models. To address the challenge, we propose a Prompt-IML framework, which leverages the rich prior knowledge of pre-trained models by employing tunable prompts. Specifically, sets of tunable prompts enable the frozen pre-trained model to extract multi-view features, including spatial and high-frequency features. This approach minimizes redundant architecture for feature extraction across different views, resulting in reduced training costs. In addition, we develop a plug-and-play Feature Alignment and Fusion module that seamlessly integrates into the pre-trained models without additional structural modifications. The proposed module reduces noise and uncertainty in features through interactive processing. The experimental results showcase that our proposed method attains superior performance across 6 test datasets, demonstrating exceptional robustness.