DocGenie: A Framework for High-Fidelity Synthetic Document Generation via Seed-Guided Multimodal LLM and Document-Aware Evaluation

Published: 06 May 2025, Last Modified: 06 May 2025SynData4CVEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Synthetic Document Generation, Multimodal Large Language Models (MLLMs), Seed-Guided Generation, Document-Aware Evaluation, Visual Realism Enhancement, Document Understanding
TL;DR: DocGenie is a framework for generating high-fidelity synthetic business documents using seed-guided multimodal LLM and evaluating them with a document-aware metric, Layout-FID
Abstract: Obtaining large-scale, high-quality datasets for document understanding tasks such as optical character recognition, key information extraction, and layout analysis is costly and time-consuming. Synthetic document generation offers a scalable alternative, but achieving visual realism, structural coherence, and semantic alignment remains a challenge. This work presents DocGenie, a framework for generating high-fidelity, domain-adaptable synthetic business documents using a frontier multimodal large language model (MLLM). DocGenie leverages seed examples to guide HTML-based document generation, aligning outputs with domain-specific content and layout conventions. To evaluate quality and similarity, DocGenie introduces Layout-FID, a document-aware adaptation of Fréchet Inception Distance that replaces InceptionV3 with LayoutLMv3 embeddings. Layout-FID better captures textual, structural, and visual features, yielding more reliable scores across various business document categories: invoices, receipts, forms, and budgets. To enhance the visual realism of the generated documents, two post-processing strategies are explored: distortions derived from seed documents via (i) human inspection and (ii) MLLM-based prediction. This comparative study assesses their effectiveness across document categories with varying realistic distortion profiles. DocGenie thus offers a practical and extensible solution for realistic synthetic document generation and evaluation tailored for document AI workflows.
Submission Number: 25
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview