STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition

Xiaoyu Zhu; Po-Yao Huang; Junwei Liang; Alexander G Hauptmann

STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition

Xiaoyu Zhu, Po-Yao Huang, Junwei Liang, Alexander G Hauptmann

22 Sept 2022 (modified: 22 Jun 2025)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: mesh-based action recognition, motion capture, transformer

TL;DR: We propose the first mesh-based action recognition method which achieves state-of-the-art performance compared to skeleton-based and point-cloud-based models.

Abstract: We study the problem of human action recognition using motion capture (MoCap) sequences. Existing methods for MoCap-based action recognition take skeletons as input, which requires an extra manual mapping step and loses body shape information. Therefore, we propose a novel method that directly models raw mesh sequences which can benefit from the body prior and surface motion. We propose a new hierarchical transformer with intra- and inter-frame attention to learn effective spatial-temporal representations. Moreover, our model defines two self-supervised learning tasks, namely masked vertex modeling and future frame prediction, to further learn global context for appearance and motion. Our model achieves state-of-the-art performance compared to skeleton-based and point-cloud-based models. We will release our code and models.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Supplementary Material: zip

Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/stmt-a-spatial-temporal-mesh-transformer-for/code)

1 Reply

Loading