CTIP: Towards Accurate Tabular-to-Image Generation for Tire Footprint Generation

Daeyoung Roh, Donghee Han, Jihyun Nam, Jungsoo Oh, Youngbin You, Jeongheon Park, Mun Yong Yi

Published: 01 Jan 2025, Last Modified: 24 Jul 2025WACV 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Generating images directly from tabular data while ensuring an accurate representation of ground truth is a useful application in manufacturing. Simply embedding tabular data to use it as a condition in image generation models often fails to learn the correspondence between tabular features and their impact on the generated image. To overcome this limitation, we propose Contrastive Tabular-Image Pre-training (CTIP), inspired by the CLIP framework. These pre-train methods help improve the quality of the embedding of the tabular encoder on the tabular data, which then helps improve the performance of the image generation model. CTIP uses contrastive learning on multiple tabular and image data pairs, allowing the model to learn how changes in certain tabular features affect images. This approach is particularly crucial in manufacturing, where accurate capture of product outcomes under varying conditions is essential. We demonstrate that applying CTIP enhances image generation performance, yielding images that closely match ground truth images, even in Feature Few-shot or Feature Zero-shot scenarios where specific features are sparse or novel. We further show the application of CTIP in tire development, where tire footprint images are generated based on tire specifications and test conditions. CTIP produces high-quality embeddings that align well with ground truth images and effectively handle the scarcity or sparseness of specific features, addressing common challenges in new product development. Our code is available in https://github.com/Noverse0/CTIP.git.