Unified Pose Embeddings: Utilizing Euclidean Space for Simplified Topology Alignment

Julian Tanke; Dongseok Shim; Kengo Uchida; Abhinanda Ranjit Punnakkal; Koichi Saito; christian simon; Shusuke Takahashi; Takashi Shibuya; Yuki Mitsufuji

Unified Pose Embeddings: Utilizing Euclidean Space for Simplified Topology Alignment

Julian Tanke, Dongseok Shim, Kengo Uchida, Abhinanda Ranjit Punnakkal, Koichi Saito, christian simon, Shusuke Takahashi, Takashi Shibuya, Yuki Mitsufuji

18 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Human Pose

TL;DR: Skeleton-agnostic pose representation

Abstract: Generative models for human motion synthesis have demonstrated remarkable capabilities across tasks such as text-to-motion generation, motion inbetweening, style transfer, and motion captioning. However, their adoption in industry remains limited, largely due to challenges in data representation. Industry applications often require diverse articulated skeleton topologies tailored to specific use cases, which are further constrained by limited data availability. Existing methods address these challenges by aligning datasets through shared subsets or unified representations. However, these approaches rely on error-prone alignment processes, limiting their flexibility and scalability. In this work, we leverage Euclidean space to represent human poses, bypassing the need for alignment in configuration space and significantly simplifying the learning objective. Using Euclidean space also frees us from the need to use a common subset representation and allows us to represent poses in any complexity we desire. To disentangle pose and body shape, we introduce a simple yet effective learning strategy. Our method achieves robust inverse kinematics with minimal data requirements, needing just over five minutes of motion capture data to integrate new topologies. We demonstrate the effectiveness of our topology-agnostic representation across three downstream tasks: motion retargeting, text-to-motion generation, and motion captioning.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 11809

Loading