Abstract: Despite recent advances in Large Language Models in text processing, Sign Language Recognition (SLR) remains an unresolved task. This is, in part, due to limitations in the available data. In this paper, we investigate combining 1D convolutions with transformer layers to capture local features and global interactions in a low-parameter SLR model. We experimented using multiple data augmentation and regularization techniques to categorize signs of the French Belgian Sign Language. We achieved a top-1 accuracy of 42.7% and a top-10 accuracy of 81.9% in 600 different signs. This model is competitive with the current state of the art while using a significantly lower number of parameters.
External IDs:dblp:journals/jcsandt/RiosBRQASH24
Loading