Golden RPG: Semantic-Aware Noise for Regional Text-to-Image Generation

18 Sept 2025 (modified: 27 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Text-to-Image Generation, Diffusion Models, RPG Framework, Golden Noise, Multimodal LLMs
Abstract: We propose Golden RPG, an enhanced framework that integrates Golden Noise optimization with the RPG (Recaptioning, Planning, and Generating) paradigm to address the fundamental disconnect between noise characteristics and regional semantic requirements in text-to-image generation. Our approach bridges two complementary paradigms: \textit{text prompt generation} (RPG) provides strategic planning through regional decomposition, while \textit{noise prompt generation} (Golden Noise) offers tactical execution through semantic-aware noise optimization. This integration resolves the regional semantic mismatch problem where different image regions require distinct visual characteristics based on their semantic importance and complexity. Our framework maintains RPG's three-stage structure while replacing uniform random noise initialization with region-specific Golden Noise, enabling each region to benefit from noise characteristics aligned with its semantic content. Experimental results demonstrate significant improvements across multiple evaluation metrics: 24\% enhancement in regional semantic alignment, 28\% improvement in cross-region coherence, and 36\% better multi-object composition quality compared to baseline RPG. The success of this paradigm fusion establishes that integrating complementary approaches can address limitations that individual methods cannot overcome, providing a foundation for advancing complex compositional text-to-image generation.
Primary Area: generative models
Submission Number: 13950
Loading