Unsupervised Motion Representation Learning with Capsule Autoencoders

Ziwei Xu; Xudong Shen; Yongkang Wong; Mohan Kankanhalli

Unsupervised Motion Representation Learning with Capsule Autoencoders

Ziwei Xu, Xudong Shen, Yongkang Wong, Mohan Kankanhalli

Published: 09 Nov 2021, Last Modified: 26 May 2025NeurIPS 2021 PosterReaders: Everyone

Keywords: Capsule Network, motion representation, skeleton-based action recognition

TL;DR: MCAE, a capsule-based motion representation, is proposed. Experiments on a novel synthetic dataset and skeleton-based action recognition dataset show its discriminative power and robustness against transformation.

Abstract: We propose the Motion Capsule Autoencoder (MCAE), which addresses a key challenge in the unsupervised learning of motion representations: transformation invariance. MCAE models motion in a two-level hierarchy. In the lower level, a spatio-temporal motion signal is divided into short, local, and semantic-agnostic snippets. In the higher level, the snippets are aggregated to form full-length semantic-aware segments. For both levels, we represent motion with a set of learned transformation invariant templates and the corresponding geometric transformations by using capsule autoencoders of a novel design. This leads to a robust and efficient encoding of viewpoint changes. MCAE is evaluated on a novel Trajectory20 motion dataset and various real-world skeleton-based human action datasets. Notably, it achieves better results than baselines on Trajectory20 with considerably fewer parameters and state-of-the-art performance on the unsupervised skeleton-based action recognition task.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: pdf

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/unsupervised-motion-representation-learning/code)

9 Replies

Loading