WeatherFLUX: Universal Weather Translation with Diffusion Models

Baik Seunghyun; Sangho Kim; Euntai Kim

WeatherFLUX: Universal Weather Translation with Diffusion Models

Baik Seunghyun, Sangho Kim, Euntai Kim

20 Sept 2025 (modified: 03 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: diffusion models, image-to-image translation, weather translation, in-context learning

Abstract: Reliable camera-based autonomous driving needs a large number of road scene images across diverse weather conditions, but public datasets mostly contain sunny scenes and have few images for snow, rain, fog, and night. Prior work has attempted to expand the range of weather conditions through image translation, but in complex road scenes the target weather is often rendered unreliably and semantic content is not preserved. To address these limitations, we present WeatherFLUX, a diffusion framework for universal weather translation that learns from limited weakly paired data. The framework supports bidirectional translation between sunny and snow, rain, fog, and night while preserving semantic content. WeatherFLUX adopts triptych prompting, an in-context setup that stacks a reference image, a source image, and a blank canvas into a single input, helping the model learn what to change and what to keep. This setup ideally works with paired data, but in practice we rely on weak pairs rather than strictly matched pairs, so the model reflects the target weather style well yet can introduce geometric differences between the source and the output. To mitigate this issue, WeatherFLUX introduces three techniques that reduce these differences and improve consistency. First, image alignment reduces geometric differences between the source and the target before training. Second, frequency-aware prompting forms a fused conditional embedding that reflects weather style from the reference and semantic cues from the source. Finally, we add a semantic preservation loss to encourage structural agreement between the output and the source that helps maintain boundaries and layout. These components yield photorealistic and scene consistent translations with strong preservation of semantic content. Extensive qualitative results demonstrate that our method is highly competitive.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 24389

Loading