Cluster-Refined Optimal Transport for Unsupervised Action Segmentation

Published: 2025, Last Modified: 05 Nov 2025ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Action segmentation in untrimmed videos is essential for comprehensive video understanding. Despite significant progress in unsupervised methods, capturing both long-range dependencies and short-duration actions simultaneously remains a challenging task. To address this challenge, this paper introduces the Cluster-Refined Optimal Transport (CROT) method, combining hierarchical clustering and optimal transport for unsupervised action segmentation. We first hierarchically cluster video frame representations to capture long-range dependencies and generate pseudo-boundaries. Initial pseudo-labels are then obtained via optimal transport, ensuring short-duration actions are recognized. Finally, these pseudo-labels are refined using the pseudo-boundaries, resulting in the final segmentation output. Extensive experiments on three public datasets, i.e., YouTube Instructions, Breakfast, and 50Salads, demonstrate that our method performs on par with or better than previous approaches.
Loading