T$^3$-S2S: Training-free Triplet Tuning for Sketch to Scene Synthesis in Controllable Concept Art Generation

TMLR Paper5746 Authors

27 Aug 2025 (modified: 03 Sept 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: 2D concept art generation for 3D scenes is a crucial yet challenging task in computer graphics, as creating natural intuitive environments still demands extensive manual effort in concept design. While generative AI has simplified 2D concept design via text-to-image synthesis, it struggles with complex multi-instance scenes and offers limited support for structured terrain layout. In this paper, we propose a Training-free Triplet Tuning for Sketch-to-Scene (T3-S2S) generation after reviewing the entire cross-attention mechanism. This scheme revitalizes the ControlNet model for detailed multi-instance generation via three key modules: Prompt Balance ensures keyword representation and minimizes the risk of missing critical instances; Characteristic Priority emphasizes sketch-based features by highlighting TopK indices in feature channels; and Dense Tuning refines contour details within instance-related regions of the attention map. Leveraging the controllability of T3-S2S, we also introduce a feature-sharing strategy with dual prompt sets to generate layer-aware isometric and terrain-view representations for the terrain layout. Experiments show that our sketch-to-scene workflow consistently produces multi-instance 2D scenes with details aligned with input prompts.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Yanwei_Fu2
Submission Number: 5746
Loading