Abstract: This paper argues that current Natural Language Processing (NLP) frameworks are fundamentally misaligned with sign language processing (SLP) due to their reliance on linear, single-channel linguistic models commonly used in spoken language processing. We analyze the fundamental differences between spoken and signed languages across four critical dimensions: (1) multi-modal vs. multi-channel representation, (2) low-resource vs. high-resource data availability and annotation efficiency, (3) disambiguation vs. channel conversion, and (4) linearity vs. spatiality representation. Existing research primarily focuses on surface-level forms, neglecting deep semantic structures that rely on coordinated multi-channel features inherent to sign languages.
We identify three underexplored challenges that highlight these gaps: the spatial modeling challenges of text-to-scene conversion, the dual representation problem in spatial metaphors, and the complexity of classifier predicate decomposition. These challenges demonstrate that SLP cannot be reduced to video-to-text or text-to-video translation, and instead requires a fundamental rethinking of NLP’s core assumptions to integrate the spatial-semantic structures of sign languages.
Paper Type: Long
Research Area: Multilingualism and Cross-Lingual NLP
Research Area Keywords: multilingual / low resource; linguistic theories; cognitive modeling; computational psycholinguistics; less-resourced languages; morphological analysis; phonology;
Contribution Types: Position papers
Languages Studied: English, Chinese, American Sign Language (ASL), Chinese Sign Language (CSL)
Submission Number: 8417
Loading