Abstract: We introduce a (semi-)automatic OCR post-processing system that utilizes web-scale linguistic corpora in providing high-quality correction. This paper is a comprehensive system overview with the focus on the computational procedures, applied linguistic analysis, and processing optimization.
0 Replies
Loading