Ego-$A^{\mathbf{3}}$: Adaptive Fusion-Based Disentangled Transformer for Egocentric Action Anticipation

Minhyuk Kim, Jong Won Jung, Eungi Lee, Seok Bong Yoo

Published: 2025, Last Modified: 25 Jan 2026ICRA 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recently, egocentric action anticipation for wearable robotics cameras has gained considerable attention due to its capability to analyze nouns and verbs from a firstperson view. However, this field encounters challenges due to various uncertainties, such as action-irrelevant information and semantically fused representations of verbs and nouns. To overcome these issues, we introduce Ego- $A^{3}$, designed to improve the robustness and reliability of egocentric action anticipation systems. Ego- $A^{3}$ adaptively extracts actionrelevant data to efficiently utilize additional information beyond visual data. Additionally, Ego- $A^{3}$ produces effective disentangled representations for verbs and nouns by employing learnable verb and noun queries. Experiments on the EpicKitchens-100 and EGTEA Gaze+ datasets demonstrate that Ego- $A^{3}$ outperforms existing methods in top-1 accuracy and mean top- 5 recall. Our code is publicly available at https://github.com/alsgur0720/egocentricanticipation.

External IDs:dblp:conf/icra/KimJLY25