SiLVERScore: Why Aren't Semantically-Aware Embeddings Used for Sign Language Generation Evaluation?

SiLVERScore: Why Aren't Semantically-Aware Embeddings Used for Sign Language Generation Evaluation?

ACL ARR 2024 December Submission1278 Authors

16 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Evaluating sign language generation has traditionally relied on back-translation, where generated signs are converted into text and assessed using text-based metrics. However, this approach presents significant challenges: (i) it leads to substantial information loss, failing to capture the multimodal nature of sign language—such as facial expressions, spatial structure, and prosody—and (ii) errors introduced during back-translation propagate through the evaluation pipeline. In this work, we propose SiLVERScore, a novel semantically-aware embedding-based evaluation metric that assesses sign language generation in a joint embedding space. Our contributions include: (1) identifying limitations of existing metrics, (2) introducing SiLVERScore for semantically-aware evaluation, (3) demonstrating its robustness to semantic and prosodic variations, and (4) exploring generalization challenges across datasets. SiLVERScore offers a step toward more reliable evaluation of sign language generation systems.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: metrics, low resource languages, automatic evaluation of datasets, evaluation methodologies, evaluation, multilingual evaluation

Contribution Types: Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data analysis, Position papers, Surveys

Languages Studied: German Sign Language, American Sign Language, Swiss German Sign Language

Submission Number: 1278

Loading