$Syn3DTxt:\: Embeding\: 3D\: Cues\: for\: Scene\: Text\: Generation$

Published: 31 Mar 2025, Last Modified: 31 Mar 2025SyntaGen 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Scene text rendering, Scene text editing, Synthetic datasets, 3D context, Spatial Geometric cues, Data augmentation.
TL;DR: This paper introduces a new synthetic dataset standard, enabling more robust and accurate scene text rendering in complex real-world environments.
Abstract: This study aims to investigate the challenge of insufficient three-dimensional context in synthetic datasets for scene text rendering. Although recent advances in diffusion models and related techniques have improved certain aspects of scene text generation, most existing approaches continue to rely on 2D data, sourcing authentic training examples from movie posters and book covers, which limits their ability to capture the complex interactions among spatial layout and visual effects in real-world scenes. In particular, traditional 2D datasets do not provide the necessary geometric cues for accurately embedding text into diverse backgrounds. To address this limitation, we propose a novel standard for constructing synthetic datasets that incorporates surface normals to enrich three-dimensional scene characteristic. By adding surface normals to conventional 2D data, our approach aims to enhance the representation of spatial relationships and provide a more robust foundation for future scene text rendering methods. Extensive experiments demonstrate that datasets built under this new standard offer improved geometric context, facilitating further advancements in text rendering under complex 3D-spatial conditions.
Submission Number: 19
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview