Abstract: Large amounts of ground truth data is vital for building, testing, analyzing and improving the performance of character recognizers especially those using segmentation based routines. Ground truth information, the annotation, can be associated with the document images at the paragraph level, the sentence level, the word level, and up until the character or stroke level. Providing huge annotated datasets for this purpose manually is a very taxing and error prone procedure. Therefore, it is important to complement the automatic tools for metadata extraction with tools that provide an efficient human-computer interface to experts for validation and correction to simplify the creation of recognizers. In this paper we present the first semi-automatic tool for annotation Arabic online handwritten documents. A tool provided to automate and simplify document visualization, manipulation and annotation of documents at the character level generating transcription files ready for use by any handwriting recognizer. The tool is a set of interactive user interfaces guiding the user along the whole process and reducing the human effort and time by the activation of smart segmentation utilities offering satisfying performance and allowing intervention for validation.
0 Replies
Loading