Keywords: action recognition, object states, object attributes
TL;DR: using 2 object states for learning actions
Abstract: Object-centric actions cause changes in object states, including their visual appearance and their immediate context. We propose a computational framework that uses only two object states, start and end, and learns to recognize the under-lying actions. Our approach has two modules that learn subtle changes induced by the action and suppress spurious correlations. We demonstrate that only two object states are sufficient to recognize object-centric actions. Our framework per-forms better than approaches that use multiple frames and a relatively large model.Moreover, our method generalizes to unseen objects and unseen video datasets
3 Replies
Loading