Boundary Voting Network for Ambiguity-Aware Timestamp-Supervised Action Segmentation

Runzhong Zhang, Yueqi Duan, Yang Chen, Weipeng Hu, Chen Cai, Suchen Wang, Yap-Peng Tan

Published: 01 Nov 2025, Last Modified: 22 Feb 2026IEEE Transactions on Circuits and Systems for Video TechnologyEveryoneRevisionsCC BY-SA 4.0
Abstract: Timestamp-supervised action segmentation aims to segment and classify actions in untrimmed videos with a random frame annotated per action. Precisely localizing action boundaries from timestamp annotations is crucial for this setting, as it enables generating framewise pseudo-labels and applying the well-explored fully-supervised training. However, prevailing methods struggle with intrinsic uncertainty in boundary localization due to less discriminative features in action-transiting regions. This imprecise boundary estimation significantly reduces the stability and reliability of the generated pseudo-labels in ambiguous action-transiting regions, consequently resulting in performance deterioration of the trained segmentation models. In our paper, we introduce the boundary voting network that mitigates feature ambiguity by hierarchically propagating video-level global prior knowledge into local action-transiting regions. By generating key action representations as votes throughout the video and targeting action-transiting regions, all votes collaboratively contribute to action-transiting feature enhancement and boundary localization refinement. Extensive experiments demonstrate the effectiveness of our method on GTEA, 50Salads, and Breakfast datasets.
Loading