Abstract: Consider a real-world problem where we wish to adapt an existing action recognition (AR) model to a new environment. A common approach is to fine-tune a model on a set of labeled videos of actions performed in that environment. Such an approach is costly, since we need to record and annotate the videos, and fine-tune the model. At the same time, there has been recent interest in AR models that take an object-centric approach. In many cases these models are more structured, e.g., containing a module dedicated to object localization. Could we perform adaptation to a new environment via objects alone? We propose to re-use a previously trained AR model and \emph{only adapt its object localization module}. Specifically, we train class-agnostic detectors that can adapt to each new environment. The idea of performing AR model adaptation via objects is novel and promising. While it requires some annotated images with the localized objects in the new environment, such supervision cost is lower than that of a conventional approach above. We conduct experiments on unseen kitchens in within- and across- dataset settings using Epic-Kitchen and EGTEA benchmarks, and show that AR models equipped with our object detectors can efficiently adapt to new environments.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
5 Replies
Loading