SPARK: Spatio-temporal Part-based Attention for Retargeting Cross-skeleton Motion

02 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: motion retargeting, skeleton processing
Abstract: Cross-skeleton motion retargeting remains a challenging problem in computer animation, particularly when dealing with characters having significantly different skeletal structures. Existing methods often struggle to preserve motion semantics while adapting to diverse skeleton topologies. We propose a novel transformer-based approach that leverages group-based body part processing and spatio-temporal attention mechanisms. Our method organizes joints into semantic body groups and employs attention pooling to generate robust representations that capture both local joint relationships and global body dynamics. A transformer encoder models temporal dependencies across these body-part tokens, learning motion patterns invariant to specific skeletal configurations. The decoder uses cross-attention to enable fine-grained motion transfer by attending to spatial body part correspondences and temporal motion patterns. We incorporate T-pose conditioning and joint text embeddings to provide anatomical structure awareness during retargeting. Evaluation on the Mixamo dataset demonstrates particular strength in handling complex skeletal variations while maintaining motion quality and semantic consistency. We will release the code to facilitate reproducibility and future research.
Primary Area: learning on graphs and other geometries & topologies
Submission Number: 1063
Loading