Abstract: Camera-captured document images usually suffer from various appearance degradations, which hamper the clarity of content and preclude subsequent analysis and recognition systems. Most of the existing methods are tailored for one or relatively few degradations, making them feasible only in limited scenarios. However, in real-world applications, degradations are more diverse, and different degradations may arise simultaneously in a single image. To remedy this limitation, we aimed to achieve appearance enhancement for camera-captured document images in the wild, where degradations exhibit more diversity and may coexist simultaneously within the same image. To realize this, we propose a new end-to-end neural network called GCDRNet, which consists of two cascaded subnets, global context learning network (GC-Net) and detail restoration network (DR-Net). The GC-Net is used for global context modeling, and the DR-Net is used for detail restoration through a multiscale and multiloss training strategy. To train and validate GCDRNet in real-world scenarios, we constructed a new benchmark called real-world document image appearance enhancement (RealDAE), which contains 600 real-world degraded document images that are carefully annotated with pixelwise alignment. To the best of our knowledge, RealDAE is the first dataset that targets multiple degradations in the wild. Extensive experiments validated the superiority and advancement of our GCDRNet and RealDAE compared to the existing methods and datasets, respectively. In addition, experiments also demonstrated that image appearance enhancement as a preprocessing procedure can effectively improve the performance of downstream tasks, such as text detection and recognition.
Loading