TCD: TEXT IMAGE CHANGE DETECTION FOR MULTILINGUAL DOCUMENT COMPARISON

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Optical Character Recognition, Semantic Segmentation, Change Detection, Text Image Change Detection
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We propsoe text image change detection model for mulilingual docuemnt comparison to overcome the limitations of optical character recogniton methods. Additonally, created a dataset for benchmarking because of unavailabiltiy of public datasets.
Abstract: In general, the core technology used in imaged document comparison is basedon Optical Character Recognition (OCR). However, the main drawbacks of using OCR for document comparison are that most users have to pick relevant language models for each document. Moreover, a multilingual document needs a multilingual OCR model, or a hybrid model has poor recognition performance. To overcome such drawbacks, we propose common Text image Change Detection (TCD) model for multilingual documents that utilize the unit-level text image-to-image comparison instead of text recognition. Our model generates the change segmentation maps in both directions from source to target and target to source. Furthermore, we propose to use the correlation between multi-scale attention features, which mitigates pre-processing of text image position and scale alignment. We created test data from printed and scanned documents in different languages and added public datasets such as Distorted document images (DDI-100), and Document binarization dataset (LRDE DBD). Finally, we compare the performance of our model with state-of-the-art semantic segmentation and change detection (CD) models, and also with OCR models. Experimental results on benchmarks demonstrate that our model outperforms other semantic segmentation models relatively by a good margin and meets the similar performance that of OCR methods.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3003
Loading