Benchmarking Data Augmentation for Contrastive Learning in Static Sign Language Recognition

Ariel Basso Madjoukeng, Jerome Fink, Pierre Poitier, Edith Belise Kenmogne, Benoît Frénay

Published: 15 Jan 2025, Last Modified: 06 Mar 2026ESANN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Sign language (SL) is a communication method used by deaf people. Static sign language recognition (SLR) is a challenging task aimed at identifying signs in images, for which acquisition of annotated data is time-consuming. To leverage unannotated data, practitioners have turned to unsupervised methods. Contrastive representation learning proved to be effective in capturing important features from unannotated data. It is known that the performance of the contrastive model depends on the data augmentation technique used during training. For various applications, a set of effective data augmentation has been identified, but it is not yet the case for SL. This paper identifies the most effective augmentation for static SLR. The results show a difference in accuracy of up to 30% between appearance-based augmentations combined with translations and augmentations based on rotations, erasing, or vertical flips.

External IDs:doi:10.14428/esann/2025.es2025-142