Abstract: Egocentric action recognition is becoming an increasingly researched topic thanks to the rising popularity of wearable cameras. Despite the numerous publications in the field, the learned representations still suffers from an intrinsic “environmental bias”. To address this issue, domain adaptation and generalization approaches have been proposed, which operate by either adapting the model to target data during training or by learning a model able to generalize to unseen videos by exploiting the knowledge from multiple source domains. In this work, we propose to adapt a model trained on source data to novel environments at test time, making adaptation practical to real-world scenarios where target data are not available at training time. On the popular EPIC-Kitchens dataset, we present a new benchmark for Test-Time Adaptation (TTA) in egocentric action recognition. Moreover, we propose a new multi-modal TTA approach, which we call RNA $$^{++}$$ , and combine it with a new set of losses aiming at reducing classifier’s uncertainty, showing remarkable results w.r.t. existing TTA methods inherited from image classification. Code available: https://github.com/EgocentricVision/RNA-TTA .
0 Replies
Loading