Evaluating Text-to-Image Diffusion Models for Texturing Synthetic Data

Published: 06 May 2025, Last Modified: 09 May 2025SynData4CVEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Synthetic data generation, text-to-image diffusion models, object-centric reprensentation learning
TL;DR: We compare text-to-image diffusion models with random textures for synthetic data generation and find that diffusion models do not outperform the simpler method of using random textures.
Abstract: Building computer vision systems that can handle diversity in objects or environments often requires large amounts of data, which can be difficult to collect. Synthetic data generation offers a promising alternative, but limiting the sim-to-real gap requires significant engineering efforts. To reduce this engineering effort, we investigate the use of pretrained text-to-image diffusion models for texturing synthetic images. In particular, we compare diffusion-based texturing with using random textures, a common domain randomization technique in synthetic data generation. We evaluate the texturing approaches on two object-centric representations: keypoints and segmentation masks and measure their efficacy on real-world datasets for three object categories: shoes, T-shirts, and mugs. Surprisingly, we find that texturing using a diffusion model performs on par with random textures, despite generating seemingly more realistic images. Our results suggest that, for now, using diffusion models for texturing does not provide advantages over the conceptually simpler method of using random textures.
Submission Number: 9
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview