Keywords: Uncertainty Quantification, Object Detection, Tranformer
TL;DR: We investigate DETR’s prediction dynamics and propose novel uncertainty quantification methods to assess image-level reliability, leveraging differences between positive and negative predictions.
Abstract: Object detection is a computer vision task with significant utility, with real-world applications ranging from autonomous driving to warehousing and medical image analysis.
Recently, Object Detection Transformers (DETR) have emerged as a prominent approach, offering an end-to-end prediction pipeline.
The core innovation of DETR lies in the introduction of object queries, which attend to each other throughout the Transformer decoder layers and provide a set of outputs (i.e., bounding boxes and class probabilities) for a given image.
Despite these advances, the mechanisms behind how these predictions are generated and interact are not well understood.
For this reason, this paper explores the underlying dynamics of DETR’s predictions and presents empirical findings that highlight how different predictions within the same image serve distinct roles, leading to varying levels of reliability across those predictions.
In particular, we investigate the significance of differentiating between positive and negative predictions for uncertainty quantification (UQ) in DETR.
Leveraging these insights, we propose novel post hoc UQ methods to quantify the image-level reliability of DETR and demonstrate their effectiveness through numerical analysis.
Submission Number: 89
Loading