Semantic-Weighted Word Error Rate Based on BERT for Evaluating Automatic Speech Recognition Models

Yutao Zhang; Jun Ai

Semantic-Weighted Word Error Rate Based on BERT for Evaluating Automatic Speech Recognition Models

Yutao Zhang, Jun Ai

Published: 01 Jan 2024, Last Modified: 14 May 2025DSA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: To address the limitations of traditional evaluation metrics, which fail to differentiate the importance of words and cannot provide detailed, accurate assessments of Automatic Speech Recognition (ASR) models, this paper introduces the Semantic-Weighted Word Error Rate (SWWER). SWWER leverages the BERT model to assign weights to each word based on its contribution to the semantic content of the text, enabling a more accurate and nuanced evaluation of ASR models. Additionally, to provide a more comprehensive assessment of ASR models from both lexical and semantic perspectives, this study proposes a hybrid evaluation metric, H_eval, defined as the harmonic mean of SWWER and Semantic Similarity (SD). Furthermore, this paper explores the potential of integrating Large Language Models (LLMs) in correcting ASR transcriptions. Notably, SWWER effectively reflects the impact of LLM corrections on the transcribed text, indicating that for the same number of erroneous words, higher weights assigned to these errors correlate with greater semantic loss, potentially implying less effective corrections by the LLM. The SWWER and H_eval merics proposed in this paper not only offer precise and comprehensive evaluations but also provide new insights for improving ASR systems and optimizing their performance through the use of LLMs.

Loading