An End-to-End OCR Text Re-organization Sequence Learning for Rich-Text Detail Image Comprehension

Liangcheng Li, Feiyu Gao, Jiajun Bu, Yongpan Wang, Zhi Yu, Qi Zheng

Published: 2020, Last Modified: 27 Jun 2023ECCV (25) 2020Readers: Everyone

Abstract: Nowadays the description of detailed images helps users know more about the commodities. With the help of OCR technology, the description text can be detected and recognized as auxiliary information to remove the visually impaired users’ comprehension barriers. However, for lack of proper logical structure among these OCR text blocks, it is challenging to comprehend the detailed images accurately. To tackle the above problems, we propose a novel end-to-end OCR text reorganizing model. Specifically, we create a Graph Neural Network with an attention map to encode the text blocks with visual layout features, with which an attention-based sequence decoder inspired by the Pointer Network and a Sinkhorn global optimization will reorder the OCR text into a proper sequence. Experimental results illustrate that our model outperforms the other baselines, and the real experiment of the blind users’ experience shows that our model improves their comprehension.

0 Replies