TopoFormer: Topology-aware Transformer for Reactive Motion Prediction in Close Interactions

22 Sept 2023 (modified: 01 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Supplementary Material: zip
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Motion prediction, 3D sekeletal motion, close interaction, transformer, Gauss Linking Integral, human motion
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Topology-aware Transformer-based method for generting 3D skeletal reactive motions in two-person close interactions
Abstract: With high-quality motion datasets more accessible, data-driven modelling of close interactions between two or more people has attracted more research interest in recent years. Such models can be used to understand the intent of the people by predicting the reactive motion when they are closely interacting with each other. However, failure in synchronising the motions between people as well as implausible motions such as interpenetrations of body parts can still be found in State-of-the-Art (SOTA) interaction prediction approaches. We argue that commonly used motion representations in Euclidean space, such as joint positions and joint angles in previous approaches do not capture the spatial relations between the body parts effectively. In this paper, we propose a new Transformer, called `TopoFormer', for predicting the reactive motion of one of the characters in a Two-person close interaction by giving the motion of the other character and the interaction class label as input. TopoFormer consists of a Topology-Aware Spatio-Temporal Embedding and Spatial Relation-aware Multi-Headed Self Attention (SR-MSA) to facilitate the learning of the latent representation of close interactions. By representing the body parts using a set of articulated chains instead of the commonly used graph-based structure in recent works, the spatial relations can be more effectively represented using a topology-based representation, Gauss Linking Integral (GLI). Experimental results highlight the effectiveness of our proposed method as we achieved SOTA performance in Aligned Mean Error (AME) and a newly proposed metric Average Interpenetration per Frame (AIF) across different datasets and qualitatively more synchronised and plausible interactions.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5628
Loading