Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: few-shot action recognition, causal representation learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: The goal of few-shot action recognition is to recognize actions in video sequences for which there exists only a few training samples. The challenge is to adapt a base model effectively and efficiently when the base and novel data have significant distributional disparities. To this end, we learn a model of a temporal causal mechanism from the base data by variational inference. When adapting the model by training on the novel data set we hold certain aspects of the causal mechanism fixed, updating only auxiliary variables and a classifier. During this adapation phase, we treat as invariant the time-delayed causal relations between latent causal variables and the mixing function that maps causal variables to action representations. Our experimental evaluations across standard action recognition datasets validate our hypothesis that our proposed method of Temporal Causal Mechanism Transfer (TCMT) enables efficient few-shot action recognition in video sequences with notable performance improvements over leading benchmarks.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 342
Loading