Approximate Size Targets Are Sufficient for Accurate Semantic Segmentation

14 May 2024 (modified: 06 Nov 2024)Submitted to NeurIPS 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Cardinality of segments, KL divergence
Abstract: We propose a new general form of image-level supervision for semantic segmentation based on approximate targets for the relative size of segments. At each training image, such targets are represented by a categorical distribution for the "expected" average prediction over the image pixels. We motivate the zero-avoiding variant of KL divergence as a general training loss for any segmentation architecture leading to quality on par with the full pixel-level supervision. However, our image-level supervision is significantly less expensive, it needs to know only an approximate fraction of an image occupied by each class. Such estimates are easy for a human annotator compared to pixel-accurate labeling. Our loss shows significant robustness to size target errors, which may even improve the generalization quality. The proposed size targets can be seen as an extension of the standard class tags, which correspond to non-zero size targets in each image. Using only a minimal amount of extra information, our supervision improves and simplifies the training. It works on standard segmentation architectures as is, unlike tag-based methods requiring complex specialized modifications and multi-stage training.
Primary Area: Machine vision
Submission Number: 7903
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview