Challenge report: Track 2 of Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving
Keywords: Autonomous Driving, Video Generation, Pre-trained motion layer
TL;DR: Our primary contribution is the introduction of a pre-trained motion layer into the model, enabling the generation of videos with strong temporal consistency by training only this layer.
Subject: Corner case mining and generation for autonomous driving
Confirmation: I have read and agree with the submission policies of ECCV 2024 and the W-CODA Workshop on behalf of myself and my co-authors.
Abstract: This paper presents our submission to Track 2 of the Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving. While the field of autonomous driving has garnered significant interest, the collection and annotation of street scenes remain prohibitively expensive. In this work, we explore previous methods of street scene generation using diffusion models and propose our approach. By combining pre-trained image and motion layers, we achieve high-quality results with minimal training. To generate videos of arbitrary length with smooth transitions, we employ a sliding window technique to mitigate discrepancies between segments.
Supplementary Material: zip
Submission Number: 2
Loading