Diffusion Models for Document Image Generation

Published: 01 Jan 2023, Last Modified: 05 Jul 2025ICDAR (3) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Image generation has got wide attention in recent times; however, despite advances in image generation techniques, document image generation having wide industry application has remained largely neglected. The previous research on structured document image generation uses adversarial training, which is prone to mode collapse and over-fitting, resulting in lower sample diversity. Since then, diffusion models have surpassed previous models on conditional and unconditional image generation. In this work, we propose diffusion models for unconditional and layout-controlled document image generation. The unconditional model achieves state-of-the-art FID 14.82 in document image generation on DocLayNet. Furthermore, our layout-controlled document image generation models beat previous state-of-the-art in image fidelity and diversity. On the PubLayNet dataset, we get an FID score of 15.02. On the complicated DocLayNet dataset, we obtained an FID score of 20.58 with \(256 \times 256\) resolution for conditional image generation.
Loading