Abstract: End-to-end speech translation (ST) presents notable disambiguation challenges as it necessitates simultaneous cross-modal and cross-lingual transformations. While word sense disambiguation is an extensively investigated topic in textual machine translation, the exploration of disambiguation strategies for ST models remains limited. Addressing this gap, this paper introduces the concept of speech sense disambiguation (SSD), specifically emphasizing homophones - words pronounced identically but with different meanings. To facilitate this, we first create a comprehensive homophone dictionary and an annotated dataset rich with homophone information established based on speech-text alignment. Building on this unique dictionary, we introduce AmbigST, an innovative homophone-aware contrastive learning approach that integrates a homophone-aware masking strategy. Our experiments on different MuST-C and CoVoST ST benchmarks demonstrate that AmbigST sets new performance standards. Specifically, it achieves SOTA results on BLEU scores for English to German, Spanish, and French ST tasks, underlining its effectiveness in reducing speech sense ambiguity. Data, code, and scripts will be released.
Paper Type: long
Research Area: Machine Translation
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English, German, Spanish, French
0 Replies
Loading