Keywords: benchmark, semantic segmentation, object detection, common corruptions, OOD, adversarial attacks
TL;DR: robustness benchmarking tools and benchmarks for semantic segmentation and object detection.
Abstract: Reliability and generalization in deep learning are predominantly studied in the
context of image classification. Yet, possible real-world applications in safety-critical
domains involve a broader set of semantic tasks, such as semantic segmentation
and object detection, which come with a diverse set of dedicated model
architectures. To facilitate research towards robust model design in segmentation
and detection, our primary objective is to provide benchmarking tools regarding
robustness to distribution shifts and adversarial manipulations. We propose the
benchmarking tools SEMSEGBENCH and DETECBENCH, along with the most extensive
evaluation to date on the reliability and generalization of semantic segmentation
and object detection models. In particular, we benchmark 76 segmentation
models across four datasets and 61 object detectors across two datasets, evaluating
their performance under diverse adversarial attacks and common corruptions. Our
findings reveal systematic weaknesses in state-of-the-art models and uncover key
trends based on architecture, backbone, and model capacity. SEMSEGBENCH and
DETECBENCH are open-sourced in an Anonymous Repository (URL: https://anonymous.4open.science/r/benchmarking_reliability_generalization/) with our complete
set of 6139 evaluations. We anticipate the collected data to foster and encourage
future research towards improved model reliability beyond classification.
Primary Area: datasets and benchmarks
Submission Number: 18439
Loading