SynHLMA:Synthesizing Hand Language Manipulation for Articulated Object with Discrete Human Object Interaction Representation
Keywords: Grasp Synthesis, Articulated Object, Multi-Modal Learning
TL;DR: We construct a large-scale high-quality hand-object interaction dataset with a physics engine and propose a multi-modal VQ-VAE-based generative model for articulated object grasp synthesis.
Abstract: Generating hand grasps with language instructions is a widely studied topic that benefits from embodied AI and VR/AR applications. While transferring into **H**and **A**rticulated **O**bject **I**nteraction (**HAOI**), the hand grasps synthesis requires not only object functionality but also long-term manipulation sequence along the object deformation. This paper proposes a novel HAOI sequence generation framework **SynHLMA**, to **Syn**thesize **H**and **L**anguage **M**anipulation for **A**rticulated objects. Given a complete point cloud of an articulated object, we utilize a discrete HAOI representation to model each hand-object interaction frame. Along with the natural language embeddings, the representations are trained by an HAOI Manipulation Language Model to align the grasping process with its language description in a shared representation space. An **articulation-aware loss** is employed to ensure hand grasps follow the dynamic variations of articulated object joints. In this way, our SynHLMA achieves three typical hand manipulation tasks for articulated objects: HAOI generation, HAOI prediction, and HAOI interpolation. We evaluate SynHLMA on our built HAOI-lang dataset, and experimental results demonstrate the superior hand grasp sequence generation performance compared with state-of-the-art methods. We also show a robotics grasp application that enables dexterous grasp execution from imitation learning using the manipulation sequence provided by our SynHLMA. Our codes and datasets will be made publicly available.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 6863
Loading