Abstract: Text images, such as scanned documents, certificates, and chat-box screenshots, are commonly used for multimodal communication. However, the rise of advanced deep image editing tools raises concerns about the authenticity of text images. Unlike natural images, even subtle tampering in text images, like a single character, can substantially alter the semantics. To address this issue, we propose a robust network framework for detecting and localizing text image forgeries. This framework involves forgery traces enhancement and multiscale attention-based forgery localization. Specifically, we introduce a deep module to enhance forgery traces from multiple domains, reducing distraction from the image background and facilitating the forgery localization. An efficient encoder-decoder network is devised by fusing high-level semantics with low-level details using a multiscale attention module. Through joint restoration and multiscale forgery feature learning, even delicately tampered regions can be effectively exposed. To evaluate the framework under real-world distortion scenarios, we create a large-scale dataset featuring various tampering methods and distortions. Experimental results demonstrate that the proposed method outperforms the state-of-the-art methods in terms of both accuracy and robustness, especially for the small tampering regions.
Loading