ODTr: Transformer Integrating OCR Auxiliary Map and Image Depth Information for Document Image Unwarping
Abstract: In this study, we introduce a novel application of transformer for document image unwarping, leveraging depth information and Optical Character Recognition (OCR) results. Our proposed model integrates two key modules: a depth module and an OCR module, into the transformer framework tailored for document image processing. The depth module predicts the relative depth of each pixel in the document image, thereby providing crucial spatial context for unwarping. Concurrently, the OCR module identifies regions suitable for OCR, acting as preferences for the unwarping process. This hybrid approach aims to mitigate text distortion inherent in document image unwarping, consequently enhancing OCR accuracy, although it may reduce our model’s capability in structural image unwarping. Experimental results showcase our model’s effectiveness, achieving a Character Error Rate (CER) of 24.81%, marking a significant 6.2% absolute enhancement compared to the baseline method, DocTr.
Loading