Does AI Assistance Preserve or Collapse Disagreement? A Study of Pre-Annotations in Ambiguous Video Labeling

Juan Gutiérrez; Víctor Gutiérrez-García; Jose Luis Blanco-Murillo

Does AI Assistance Preserve or Collapse Disagreement? A Study of Pre-Annotations in Ambiguous Video Labeling

Juan Gutiérrez, Víctor Gutiérrez-García, Jose Luis Blanco-Murillo

Published: 02 Jun 2026, Last Modified: 09 Jun 2026Pluralistic-Alignment 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Human-in-the-Loop (HITL), Evaluation Methodology, Video Annotation, Cross-Modal Learning

TL;DR: AI pre-annotations made ambiguous video labeling faster and more consistent, while preserving similar human-consensus alignment; we audit when this helps standardize boundaries vs. collapses disagreement.

Abstract: AI-generated Pre-Annotations can accelerate video labeling, but they may also anchor annotators to model priors and suppress disagreement that is valuable for pluralistic dataset construction. We study this tradeoff in ambiguous temporal video annotation, where annotators choose event boundaries and assign context-dependent labels such as "normal" or "abnormal." We introduce a controlled audit protocol that separates annotation cost, consensus alignment, inter-annotator consistency, temporal-boundary variation, semantic-label variation, latent-space standardization, and edit behavior. In a counterbalanced pilot study with 18 annotators and 180 annotation sessions, a fixed CLIP-based Pre-Annotation engine reduced mean annotation time by 23.11\%; 72\% of annotators were faster with assistance, with a median per-annotator gain of 35\%. Assistance increased inter-annotator consistency and CLIP-space standardization, while semantic-label entropy changed only slightly in aggregate. A six-annotator human consensus diagnostic showed comparable alignment across conditions (AMI $\approx 0.64$ in both conditions), but this diagnostic is self-including and should be interpreted as descriptive rather than as independent evidence that model influence is absent. Overall, the results suggest that Pre-Annotations acted mainly as editable temporal scaffolds in this pilot setting, while leaving open the possibility of subtler semantic anchoring effects. We contribute an audit framework and anonymized interaction traces for studying when AI-assisted annotation preserves, reshapes, or collapses human disagreement.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 130

Loading