Therbligs in Action: Video Understanding through Motion PrimitivesDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Abstract: In this paper we introduce a rule-based, compositional, and hierarchical modelling of action using Therbligs as our atoms - a consistent, expressive, contact-centered representation of action. Over these atoms we introduce a differentiable method of rule-based reasoning to regularize for logical consistency. Our approach is complementary to other approaches in that the Therblig-based representations produced by our architecture augment rather than replace existing architectures' representations. We release the first Therblig-centered annotations over two popular video datasets - EPIC Kitchens 100 and 50-Salads. We evaluate our system for the task of action segmentation, demonstrating a substantial improvement using a base GRU architecture over baseline of 5.6% and 4.1% (14.4% and 6.5% relative) increase in accuracy (and increases with respect to all other metrics as well) over EPIC Kitchens and 50-Salads, respectively. We also demonstrate benefits to adopting Therblig representations for two state-of-the-art approaches - MSTCN++ and ASFormer - observing a 10.3%/10.7% relative improvement, respectively, over EPIC Kitchens and 9.3%/6.1% relative improvement, respectively, over 50 Salads. All code and data is to be released upon paper acceptance.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
5 Replies

Loading