EmoSign: A Multimodal Dataset for Understanding Emotions in American Sign Language

EmoSign: A Multimodal Dataset for Understanding Emotions in American Sign Language

ICLR 2026 Conference Submission14816 Authors

19 Sept 2025 (modified: 23 Dec 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: sign language, emotion recognition, dataset

Abstract: Unlike spoken languages where the use of prosodic features to convey emotion is well studied, indicators of emotion in sign language remain poorly understood, creating communication barriers in critical settings. Sign languages present unique challenges as facial expressions and hand movements simultaneously serve both grammatical and emotional functions. To address this gap, we introduce EmoSign, the first sign video dataset containing sentiment and emotion labels for 200 American Sign Language (ASL) videos. We also collect open-ended descriptions of emotion cues, such as specific expressions and signing speed, that lead to the identified emotions. Annotations were done by 3 Deaf ASL signers with professional interpretation experience. Alongside the annotations, we include benchmarks of baseline models for sentiment and emotion classification. Our benchmark results show that current multimodal models fail to integrate visual cues into emotional reasoning and exhibit bias towards positive emotions. This dataset not only addresses a critical gap in existing sign language research but also establishes a new benchmark for understanding model capabilities in multimodal emotion recognition for sign languages. This work can inspire new architectures that integrate fine-grained visual understanding with linguistic context awareness to distinguish e.g., syntactic versus affective functions of visual cues.

Primary Area: datasets and benchmarks

Submission Number: 14816

Loading