TextSleuth: Towards Explainable Tampered Text Detection

TextSleuth: Towards Explainable Tampered Text Detection

ACL ARR 2025 May Submission7122 Authors

20 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent advancements in tampered text detection has attracted increasing attention due to its essential role in information security. Although existing methods can detect the tampered text region, the detection lacks convincing interpretation and clarity, making the prediction unreliable. To address this problem, we propose to explain the basis of tampered text detection with natural language via large multimodal models. To bridge the data gap, we propose a large-scale, comprehensive dataset, ETTD, which contains both pixel-level annotations for tampered text region and natural language annotations describing the anomaly of the tampered text. Multiple novel methods are employed to improve the quality of our dataset. To further improve explainable tampered text detection, we propose a simple yet effective model called TextSleuth, which can detect tampered text with both visual and semantic clues, and shows strong generalization across unfamiliar image styles and languages. Extensive experiments on both the ETTD dataset and the public dataset have verified the effectiveness of the proposed methods. Our dataset and code will be made publicly available.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: vision question answering, cross-modal application, multi-modality.

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis

Languages Studied: English, Chinese, Arabic, Bengali, Japanese, Korean

Submission Number: 7122

Loading