Semantic Understanding of Driving Scenes in Adverse Conditions

Published: 01 Jan 2021, Last Modified: 28 Oct 2024undefined 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Level 5 autonomy for self-driving cars requires a robust perception system that can parse input images under any visual condition. However, most of the existing work for semantic understanding of driving scenes focuses on normal conditions, i.e., daytime and clear weather. What is more, the models trained with methods and datasets pertaining to normal conditions generalize poorly to adverse visual conditions. This thesis addresses this shortcoming by introducing methods and datasets for improving the performance of semantic scene understanding algorithms under adverse conditions. At the method level, we pursue this goal by adapting algorithms from normal to adverse conditions with minimal supervision in the latter domain. At the dataset level, we construct several driving scene datasets in adverse conditions to support the training and evaluation of algorithms in these domains, and additionally define a novel task which addresses the uncertainty of semantic image content under adverse conditions. The contributions of the thesis in adaptation to adverse conditions pertain both to synthetic data generation and domain adaptation strategies. First, we introduce a physically-based fog simulation pipeline on real outdoor scenes that generates partially synthetic foggy images. These foggy images inherit the annotations of their original clear-weather counterparts and are thus used for training models on fog in a supervised setting. Second, we present a curriculum adaptation framework with synthetic and real data through a sequence of visual domains with increasing level of adversity. This framework, named Curriculum Model Adaptation (CMAda), is semi-supervised: the synthetic data it uses include annotations, while the real data do not. The main principle of CMAda is to gradually infer the missing labels of real data, starting from the easy domain of normal conditions and proceeding to increasingly harder domains, e.g. denser fog or darker time of day. In this process, the labeled synthetic data provide the supervision that is required to constrain the training. The inferred labels of real data in one domain are then used as pseudo-labels for adaptation to the next domain in the sequence. Third, we enhance CMAda by introducing two guided versions of it, Guided Curriculum Model Adaptation (GCMA) and Map-Guided Curriculum Domain Adaptation (MGCDA). Both GCMA and MGCDA use weak supervision for the real data stream in the form of corresponding images of the same scenes taken under normal conditions, which are used to refine the inferred pseudo-labels in the adverse domains. While in GCMA this refinement is performed with a simple cross-bilateral filter, in MGCDA we explicitly estimate the two-view geometry of the normal-adverse image pair to warp the labels from the normal-condition to the adverse-condition view. This thesis also contributes several datasets for semantic driving scene understanding in adverse conditions. First, we apply our fog simulation to the Cityscapes dataset and generate Foggy Cityscapes and Foggy Cityscapes-DBF. Foggy Cityscapes is obtained with the initial version of our fog simulation and includes 25000 foggy images, while Foggy Cityscapes-DBF is generated with the improved version, which additionally uses semantic annotations for depth refinement, and comprises 3475 foggy images. Second, we construct two real-world foggy datasets, Foggy Driving and Foggy Zurich. Both sets include pixel-level semantic annotations as well as bounding box annotations. Third, we introduce Dark Zurich, a real-world dataset covering multiple times of day, including daytime, twilight time and nighttime. Dark Zurich features image-level cross-time-of-day correspondences and enables training of our proposed methods that rely on such correspondences, i.e. GCMA and MGCDA. It also includes 201 pixel-level semantic annotations at nighttime for evaluation. The final dataset we introduce in this thesis is ACDC, the Adverse Conditions Dataset with Correspondences, and consists of a large set of 4006 annotated images, evenly distributed between fog, nighttime, rain, and snow. The specialized annotation protocol for ACDC with privileged information affords reliable ground truth and enables the usage of ACDC for supervised training of large models on real-world data pertaining to adverse conditions. ACDC is also used to establish a new, real-world normal-to-adverse benchmark for unsupervised and weakly supervised domain adaptation. Moreover, we define the novel task of uncertainty-aware semantic segmentation on ACDC, in which evaluation is performed with the uncertainty-aware intersection-over-union (UIoU) metric. The new task additionally requires a confidence map as output and our UIoU metric rewards predictions with confidence profiles that are consistent with human confidence.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview