Toward Human Deictic Gesture Target Estimation

Xu Cao; Pranav Virupaksha; Sangmin Lee; Bolin Lai; Wenqi Jia; Jintai Chen; James Matthew Rehg

Toward Human Deictic Gesture Target Estimation

Xu Cao, Pranav Virupaksha, Sangmin Lee, Bolin Lai, Wenqi Jia, Jintai Chen, James Matthew Rehg

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY-NC 4.0

Keywords: Deictic Gesture, Gesture Target Estimation, Social Interaction, Social Artificial Intelligence

TL;DR: Proposed the task of human deictic gesture target estimation and the first model for this task.

Abstract: Humans have a remarkable ability to use co-speech deictic gestures, such as pointing and showing, to enrich verbal communication and support social interaction. These gestures are so fundamental that infants begin to use them even before they acquire spoken language, which highlights their central role in human communication. Understanding the intended targets of another individual's deictic gestures enables inference of their intentions, comprehension of their current actions, and prediction of upcoming behaviors. Despite its significance, gesture target estimation remains an underexplored task within the computer vision community. In this paper, we introduce GestureTarget, a novel task designed specifically for comprehensive evaluation of social deictic gesture semantic target estimation. To address this task, we propose TransGesture, a set of Transformer-based gesture target prediction models. Given an input image and the spatial location of a person, our models predict the intended target of their gesture within the scene. Critically, our gaze-aware joint cross attention fusion model demonstrates how incorporating gaze-following cues significantly improves gesture target mask prediction IoU by 6% and gesture existence prediction accuracy by 10%. Our results underscore the complexity and importance of integrating gaze cues into deictic gesture intention understanding, advocating for increased research attention to this emerging area. All data, code will be made publicly available upon acceptance. Code of TransGesture is available at GitHub.com/IrohXu/TransGesture.

Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)

Submission Number: 14233

Loading