IDSPACE: A Model-Guided Synthetic Identity Document Generation Framework and Dataset

ICLR 2026 Conference Submission14105 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Identity Documents, Identity Fraud Detection, Model Evaluation, Synthetic Data Generation
TL;DR: IDSPACE: a novel synthetic identity document generation methodology leverage model guiding to improve data quality.
Abstract: To address the challenges in the lack of data for evaluating identity document fraud detection models provided by vendors or merchants, we propose IDSPACE, a cost-effective framework for generating high-quality synthetic identity documents. Our IDSPACE framework can generate a large number of identity documents using only a small number of documents from the target domain, while overcoming limitations imposed by privacy constraints and ensuring that the evaluation results using our synthetic images are consistent with images from the target domain. Our framework also allows advanced users to flexibly specify the metadata regarding the entities, capturing devices, and backgrounds of the documents to be generated. To achieve these benefits, IDSPACE has introduced two key innovations: (1) abstracting the synthetic data generation process as a function of control parameters and metadata, and thus decoupling the user-centric metadata customization process and the automatic parameter tuning process; and (2) a model-guided few-shot document generation methodology that employs Bayesian optimization (BO) to align generated documents with the target domain, ensuring fidelity and utility for model evaluation using minimal samples from the target domain.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 14105
Loading