Abstract: Action anticipation is crucial for intelligent systems such as autonomous vehicles and AR (Augmented Reality) devices. While existing studies focus on architectural improvement, they often overlook the hierarchical relationships between human intentions and their resulting behaviors. In this work, we propose “Superclass”, a novel approach that leverages hierarchical action labels to enhance action anticipation performance. Our method introduces additional annotations combining verbs, nouns, and actions to capture the complex relationships between different levels of human activity. We evaluate our approach by integrating Superclass with two different base models, AVT [8] and InAViT [13]. Experiments on the EPIC-KITCHENS-100 dataset demonstrate the effectiveness and broad applicability of our method. When applied to InAViT, the current top-performing model on EPIC-KITCHENS-100 evaluation server, Superclass improved the top-5 class mean accuracy for verbs, nouns, and actions by 0.62%, 3.36%, and 1.95% respectively.
External IDs:doi:10.1007/978-3-032-01169-5_13
Loading