A Zhuang Speech-to-Text Translation Method with Source Language-Aware Conditional Attention Mechanism
Abstract: End-to-end speech-to-text translation aims to directly convert spoken input into text in the target language. As a minority language in China, Zhuang faces challenges such as data scarcity and limited technological support in the field of speech translation. To support research on Zhuang speech translation, we construct a recording platform that is used to compile speech-text parallel corpus. We introduce a Source Language-Aware Conditional Attention Mechanism (LACA), which incorporates source language information into the Transformer to bias attention toward Zhuang-specific linguistic features. Additionally, an Implicit Connectionist Temporal Classification Auxiliary Mechanism (ICAM) is employed during training to provide auxiliary supervision for alignment learning between speech and text representations. Experimental results demonstrate that our model achieves a BLEU score of 32.46 on Zhuang speech-to-text translation, outperforming the baseline by 4.12 BLEU points.
Paper Type: Short
Research Area: Multilingualism and Cross-Lingual NLP
Research Area Keywords: less-resourced languages, minoritized languages, resources for less-resourced languages, cross-lingual transfer
Contribution Types: Approaches to low-resource settings
Languages Studied: Zhuang, Thai, English
Keywords: less-resourced languages, minoritized languages, resources for less-resourced languages, cross-lingual transfer
Submission Number: 2571
Loading