Continuous Online Action Detection from Egocentric Videos

Continuous Online Action Detection from Egocentric Videos

ICLR 2026 Conference Submission12497 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Egocentric Vision, Online Action Detection

TL;DR: We tackle egocentric online action detection by enabling on-device, single-pass training on continuous video streams, improving adaptation and generalization without storing data. We also release a new benchmark from Ego4D.

Abstract: Online Action Detection (OAD) tackles the challenge of recognizing actions as they unfold, relying solely on current and past frames. However, most OAD models are trained offline and assume static environments, limiting their adaptability to the dynamic, user-specific contexts typical of wearable devices. To address these limitations, we propose Continuous Online Action Detection (COAD), a novel task formulation in which models not only perform online action detection but also continuously learn and adapt on-the-fly from streaming videos, without storing data or requiring multiple training passes. This paradigm naturally fits egocentric vision on wearable devices, given its highly dynamic, personalized, and resource-constrained characteristics. We introduce a large-scale egocentric OAD benchmark dataset (Ego-OAD) and develop training strategies that enhance both adaptation to individual users and generalization to unseen environments. Our results on Ego-OAD demonstrate continuous learning from streaming videos improves adaptation to the user’s environment by up to 20% in top-5 accuracy, and improves generalization to new scenarios by up to 7%, advancing the development of personalized egocentric AI systems.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 12497

Loading