Keywords: Continuous sign language recognition, Video-to-text generation, Weakly supervised, Contrastive learning
TL;DR: Completing continuous sign language recognition through optimized video-to-text generation.
Abstract: Cross-modal alignment is a general way for continuous sign language recognition (CSLR) tasks. However, Due to the weakly supervised nature of CSLR, manual alignment often fails to map sign frames to glosses accurately.
In this paper, we propose a diffusion-based framework, achieving CSLR in a new view based on cross-modal generation, leveraging the inherent semantic consistency between sign videos and glosses. To address the issue of ambiguous boundaries in sign videos, we have also developed a contrastive learning-based feature enhancement strategy, which serves as a more sophisticated alternative to the simple attention mechanisms commonly used in text-to-image generation tasks.
Extensive experiments on three public sign language recognition datasets demonstrate the effectiveness of generation way in CSLR and it can achieve better performance than state-of-the-art methods.
The code of our method will be available upon acceptance.
Supplementary Material: pdf
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 364
Loading