CSLT-AK: Convolutional-embedded transformer with an action tokenizer and keypoint emphasizer for sign language translation
Abstract: Highlights•We propose an Action Tokenizer that divides sign language videos into action units.•The suggested Keypoint Emphasizer module captures the elements in sign languages.•We present a CSLT module that can reflect the global and local context of SL videos.•We thoroughly analyze the effect and meaning of each module that constitutes CSLT-AK.•CSLT-AK achieves competitive results over baseline and transformer-based models.
Loading