Abstract: We address the problem of online (streaming) action segmentation for egocentric procedural task videos. While previous studies have mostly focused on offline action segmentation, where entire videos are available for both training
and inference, the transition to online action segmentation
is crucial for practical applications like AR/VR task assistants. Notably, applying an offline-trained model directly to
online inference results in a significant performance drop
due to the inconsistency between training and inference. We
propose an online action segmentation framework by first
modifying existing architectures to make them causal. Second, we develop a novel action progress prediction module
to dynamically estimate the progress of ongoing actions and
using them to refine the predictions of causal action segmentation. Third, we propose to learn task graphs from training
videos and leverage them to obtain smooth and procedureconsistent segmentations. With the combination of progress
and task graph with casual action segmentation, our framework effectively addresses prediction uncertainty and oversegmentation in online action segmentation and achieves
significant improvement on three egocentric datasets.
Loading