Automatic Recognition of Gesture Identity and Onset of Cued-Speech

Published: 01 Jan 2024, Last Modified: 19 Feb 2025ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Cued speech is a communication system based on hand gestures used in certain communities of deaf people around the world. Hand gestures in cued speech convey complementary information to that available from lip-reading alone, which helps to communicate accurate phonological information. Cued speech provides several unique opportunities for scientific research, such as studying phonological processing via the visual modality. However, cued speech has been studied only scarcely since its invention, and there are only a few empirical datasets and standardized methods available to the scientific community. Here, we suggest several contributions to advance research in the field: (1) A new dataset on cued speech annotated for various linguistic features, (2) a new approach to automatically identify gesture identity and gesture onset from raw videos, using cuedspeech-specific features, which achieves relatively high performance (AUCidentity > 0.95, Erroronset = 93ms), and (3) additional insights into the relationship between sound and gesture production in cued speech, showing that syllable acoustic onset precedes gesture onset by around 150ms on average and that this time difference is more sensitive to consonant rather than to vowel identity. We make the new dataset and all associated tools publicly available.
Loading