Translating Motion to Notation: Hand Labanotation for Intuitive and Comprehensive Hand Movement Documentation

Published: 20 Jul 2024, Last Modified: 04 Aug 2024MM2024 OralEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Symbols play a pivotal role in the documentation and dissemination of art. For instance, we use musical scores and dance notation to document musical compositions and choreographic movements. Existing hand representations do not fit well with hand movement documentation since (1) data-oriented representations, e.g., coordinates of hand keypoints, are not intuitive and vulnerable to noise, and (2) the sign language, another widely adopted representation for hand movements, focuses solely on semantic interaction rather than action encoding. To balance intuitiveness and precision, we propose a novel notation system, named Hand Labanotation (HL), for hand movement documentation. We first introduce a new HL dataset comprising $4$M annotated images. Thereon, we propose a novel multi-view transformer architecture for automatically translating hand movements to HL. Extensive experiments demonstrate the promising capacity of our method for representing hand movements. This makes our method a general tool for hand movement documentation, driving various downstream applications like using HL to control robotic hands.
Primary Subject Area: [Experience] Art and Culture
Secondary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: This work significantly advances multimedia and multimodal processing through several key innovations: (1)Innovative Documentation Technique: The HL notation system provides a new method for documenting complex hand movements. This system enhances multimedia content creation and editing by improving motion capture and reproduction quality and efficiency. (2)Multimodal Dataset Expansion: By assembling a dataset with over 4 million annotated images of hand movements, this research not only broadens the scope of multimodal data but also supplies valuable resources for gesture recognition and interaction in virtual and augmented reality applications. (3)Advanced Machine Learning Application: Utilizing a novel multiview transformer architecture, this paper excels in automating the conversion of images into Labanotation scores. This advancement is crucial for enhancing machines' ability to understand and process visual and motion information, pushing forward multimodal data integration and analysis technologies. (4)Broad Applications and Impact: The validated methodology supports precise documentation of artistic performances and everyday gestures, and extends to robotic technologies like automated sign language translation and fine control of robotic hands, significantly advancing human-machine interaction technologies. These contributions collectively push forward the capabilities of multimedia and multimodal processing, offering new avenues for documenting, analyzing, and reproducing complex movements.
Supplementary Material: zip
Submission Number: 1054
Loading