Efficient video text recognition using multiple frame integration

Xian-Sheng Hua, Pei Yin, HongJiang Zhang

Published: 2002, Last Modified: 06 Mar 2026ICIP (2) 2002EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Text superimposed on the video frames provides supplemental but important information for video indexing and retrieval. Many efforts have been made for videotext detection and recognition (video OCR). The main difficulties of video OCR are the low resolution and the background complexity. We present efficient schemes to deal with the second difficulty by sufficiently utilizing multiple frames that contain the same text to get every clear word from these frames. Firstly, we use multiple frame verification to reduce text detection false alarms. We then choose those frames where the text is most likely clear, thus it is more possible to be correctly recognized. We then detect and joint every clear text block from those frames to form a clearer "man-made" frame. Later we apply a block-based adaptive thresholding procedure on these "man-made" frames. Finally, the binarized frames are sent to an OCR engine for recognition. Experiments show that the word recognition rate has been increased over 28% by these methods.
Loading