LangDriveEdit: Language-Driven Image Editing for Street Scenes

10 Sept 2025 (modified: 15 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Image editing, autonomous driving, driving scenes, computer vision
TL;DR: Image editing for driving scenes
Abstract: Ensuring the safety of autonomous driving systems requires rigorous evaluation across diverse street scene conditions within the Operational Design Domain (ODD), such as lighting, weather, traffic, and road variations. Yet collecting real-world data to cover this spectrum is costly, time-consuming, and often impractical. Recent advances in language-driven image editing offer a promising alternative by simulating diverse scenarios through text-based modifications. However, progress has been limited by the absence of a dedicated dataset for driving-scene editing. To address this gap, we introduce, to the best of our knowledge, the first dataset specifically designed for language-driven editing of driving scenes. Our dataset combines real-world and synthetic street scene images and supports 12 distinct editing tasks, spanning global modifications (e.g., weather, season, time of day) and fine-grained local edits (e.g., altering vehicle or pedestrian attributes). Crucially, each edit is paired with \textbf{detailed textual and visual instructions}, and, together with our proposed supervised and unsupervised fine-tuning objectives, enables state-of-the-art image editing models to follow instructions faithfully and preserve critical content. Experimental results demonstrate that training language-driven editing models with our dataset and objectives yields substantial gains in prompt alignment, visual fidelity, generation realism, and downstream driving-task performance on edited street scene images, across diverse driving domains.
Primary Area: datasets and benchmarks
Submission Number: 3537
Loading