Document Enhancement System Using Auto-encodersDownload PDF

Published: 01 Nov 2019, Last Modified: 05 May 2023DI 2019Readers: Everyone
Keywords: Document Enhancement, document image cleanup, Deep Neural Networks, Watermark Removal, ResNets, Skip Connections
TL;DR: We designed and tested a REDNET (ResNet Encoder-Decoder) with 8 skip connections to remove noise from documents, including blurring and watermarks, resulting in a high performance deep network for document image cleanup.
Abstract: The conversion of scanned documents to digital forms is performed using an Optical Character Recognition (OCR) software. This work focuses on improving the quality of scanned documents in order to improve the OCR output. We create an end-to-end document enhancement pipeline which takes in a set of noisy documents and produces clean ones. Deep neural network based denoising auto-encoders are trained to improve the OCR quality. We train a blind model that works on different noise levels of scanned text documents. Results are shown for blurring and watermark noise removal from noisy scanned documents.
1 Reply

Loading