Challenge report: Track 2 of Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving

Zhiying Du; Zhen Xing

Challenge report: Track 2 of Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving

Zhiying Du, Zhen Xing

Published: 07 Sept 2024, Last Modified: 15 Sept 2024ECCV 2024 W-CODA Workshop Abstract Paper TrackEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Autonomous Driving, Video Generation, Pre-trained motion layer

TL;DR: Our primary contribution is the introduction of a pre-trained motion layer into the model, enabling the generation of videos with strong temporal consistency by training only this layer.

Subject: Corner case mining and generation for autonomous driving

Confirmation: I have read and agree with the submission policies of ECCV 2024 and the W-CODA Workshop on behalf of myself and my co-authors.

Abstract: This paper presents our submission to Track 2 of the Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving. While the field of autonomous driving has garnered significant interest, the collection and annotation of street scenes remain prohibitively expensive. In this work, we explore previous methods of street scene generation using diffusion models and propose our approach. By combining pre-trained image and motion layers, we achieve high-quality results with minimal training. To generate videos of arbitrary length with smooth transitions, we employ a sliding window technique to mitigate discrepancies between segments.

Supplementary Material: zip

Submission Number: 2

Loading