Abstract: Domain adaptive panoptic segmentation promises to resolve
the long tail of corner cases in natural scene understanding. Previous
state of the art addresses this problem with cross-task consistency, careful system-level optimization and heuristic improvement of teacher predictions. In contrast, we propose to build upon remarkable capability
of mask transformers to estimate their own prediction uncertainty. Our
method avoids noise amplification by leveraging fine-grained confidence
of panoptic teacher predictions. In particular, we modulate the loss with
mask-wide confidence and discourage back-propagation in pixels with
uncertain teacher or confident student. Experimental evaluation on standard benchmarks reveals a substantial contribution of the proposed selection techniques. We report 47.4 PQ on Synthia→Cityscapes, which
corresponds to an improvement of 6.2 percentage points over the state of
the art.
Loading