Towards Fine-Grained Document Tampering Detection: New Dataset and Benchmark

Published: 2025, Last Modified: 25 May 2026PRCV (7) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Document Tampering Detection (DTD) has attracted increasing attention due to the importance of the authenticity of textual information. Despite significant improvement, existing methods primarily focus on localizing the forged regions while overlooking identifying the manipulation method. To bridge this gap, we propose the Fine-Grained Document Tampering Detection (FGDTD) task that simultaneously identifies tampered regions and tampering methods at the pixel level. Targeting FGDTD, a new dataset along with the corresponding evaluation metrics is proposed, which includes 16,479 tampered images collected from 12 document image datasets within 3 languages, covering 8 advanced text tampering methods at both word-level and character-level. To build the benchmark, we propose a simple yet effective framework that extends several existing DTD methods by introducing an extra tampering method identification head, and then evaluate them on the proposed dataset. Extensive experiments are performed to provide a comprehensive analysis of the extended models on the FGDTD dataset. Furthermore, cross-domain evaluation also demonstrates the value of FGDTD as a pre-training data source. To facilitate research for the community, the dataset and code will be publicly available at: https://git.io/FGDTD.
Loading