Talk with Your Fingers: A Depth-Aware Benchmark for Air-Writing Recognition

Meiqi Wu, Yuzhong Zhao, Xuchen Li, Shiyu Hu, Yuanqiang Cai, Jiahong Wu, Weiqiang Wang, Kaiqi Huang

Published: 01 Jan 2026, Last Modified: 06 Mar 2026IEEE Transactions on Circuits and Systems for Video TechnologyEveryoneRevisionsCC BY-SA 4.0

Abstract: Air-writing has emerged as a promising communication modality for AR/VR and metaverse environments, enabling quiet, non-contact text input by translating finger movements into natural language. However, existing approaches typically project in-air writing onto a virtual 2D plane and assume characters are formed with a single continuous stroke–an oversimplification that neglects the rich 3D structure inherent in natural handwriting. In this work, we challenge the “single-stroke 2D” paradigm and explore the role of depth cues in enhancing air-writing recognition. To this end, we present DAAWBench, the first large-scale Depth-Aware Air-Writing dataset, featuring 8.8 million RGB-D frames annotated with 3,755 Chinese characters from the GB2312-80 Level-1 set. Our analysis reveals consistent depth variations at stroke boundaries, indicating that stroke segmentation and character recognition can benefit from depth modeling. Based on these insights, we propose DARec, a novel 3D trajectory-based recognition model that effectively leverages depth-aware priors. Extensive experiments across in-domain and out-of-domain settings, including evaluations with vision-language models (e.g., GPT-4o, Qwen-VL) and human baselines, show that DARec significantly outperforms 2D-only counterparts, achieving 87.73% accuracy versus 9.05%. Our findings demonstrate the critical importance of depth modeling in human-computer co-creative interfaces, and we will publicly release our dataset and code at https://github.com/wmeiqi/DAAWBench.

External IDs:doi:10.1109/tcsvt.2026.3670256