Unified Pose Embeddings: Utilizing Euclidean Space for Simplified Topology Alignment

18 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Human Pose
TL;DR: Skeleton-agnostic pose representation
Abstract: Generative models for human motion synthesis have demonstrated remarkable capabilities across tasks such as text-to-motion generation, motion inbetweening, style transfer, and motion captioning. However, their adoption in industry remains limited, largely due to challenges in data representation. Industry applications often require diverse articulated skeleton topologies tailored to specific use cases, which are further constrained by limited data availability. Existing methods address these challenges by aligning datasets through shared subsets or unified representations. However, these approaches rely on error-prone alignment processes, limiting their flexibility and scalability. In this work, we leverage Euclidean space to represent human poses, bypassing the need for alignment in configuration space and significantly simplifying the learning objective. Using Euclidean space also frees us from the need to use a common subset representation and allows us to represent poses in any complexity we desire. To disentangle pose and body shape, we introduce a simple yet effective learning strategy. Our method achieves robust inverse kinematics with minimal data requirements, needing just over five minutes of motion capture data to integrate new topologies. We demonstrate the effectiveness of our topology-agnostic representation across three downstream tasks: motion retargeting, text-to-motion generation, and motion captioning.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 11809
Loading