CESLR: A Multi-Signer Benchmark and SpatioTemporal End-to-End Framework for Continuous Ethiopian Sign Language Recognition

Published: 14 Dec 2025, Last Modified: 14 Dec 2025LM4UC@AAAI2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Ethiopian Sign Language; Sign Language Recognition; Deep Learning; Continuous SLR, EthSL
Abstract: Continuous Ethiopian Sign Language Recognition (CESLR) remains an under explored challenge due to the language’s rich visual dynamics and variability among signers. This work introduces CESLR, the first large-scale, multi-signer video corpus for continuous EthSL recognition, consisting of 1,320 sentence recordings performed twice by 22 participants. To process these visual sequences, we design an end-to-end deep learning architecture that jointly models spatial and temporal cues through 2D convolutional feature extraction, 1D temporal convolution, and bidirectional recurrent encoding. A Connectionist Temporal Classification (CTC) layer enables sequence alignment without frame-level annotation, allowing the network to learn directly from sentence-level supervision. Experimental evaluation shows that the system attains a Word Error Rate (WER) of 8.82% in signer-independent testing and 47.02% on unseen sentence evaluation, revealing strong cross- signer generalization while highlighting the difficulty of novel sentence recognition. The proposed dataset and model establish a foundational benchmark for advancing automatic EthSL translation and assistive communication technologies in multilingual settings.
Submission Number: 42
Loading