Noisy Annotations in Segmentation

Moshe Kimhi; Omer Kerem; Eden Grad; Ehud Rivlin; Chaim Baskin

Noisy Annotations in Segmentation

Moshe Kimhi, Omer Kerem, Eden Grad, Ehud Rivlin, Chaim Baskin

10 May 2025 (modified: 30 Oct 2025)Submitted to NeurIPS 2025 Datasets and Benchmarks TrackEveryoneRevisionsBibTeXCC BY 4.0

Keywords: noisy labels, instance segmentation, cardiac ultrasound

TL;DR: Benchmarking label noise in instance segmentation using both varying predefined noise types and auto-annotation noise on real and synthetic data.

Abstract: We propose four noise-augmented benchmarks—**COCO-N**, **CityScapes-N**, **VIPER-N** and the weak-annotation track **COCO-WAN**—that provide a unified test-bed for studying annotation noise in instance segmentation. A parametric engine stochastically perturbs mask boundaries, drifts spatial extents, flips categories and omits instances at three severity tiers, producing Monte-Carlo variants of any COCO-style corpus. Evaluating popular segmentation models such as Mask R-CNN, Mask2Former, YOLACT and SAM reveals up to 35 \% drops in mask mAP under moderate noise, underscoring the limits of current learning-from-noisy-labels techniques when errors are spatial rather than purely categorical. All proposed \textbf{Benchmark-N} suite establishes a reproducible baseline for noise-aware segmentation and motivates future work on robust objectives, data-centric annotation pipelines and noise-adaptive architectures.

Primary Area: Datasets & Benchmarks for applications in computer vision

Submission Number: 1208

Loading