Enhancing ASR Performance through Relative Word Frequency in OCR and Normal Word Frequency Analysis

Published: 01 Jan 2024, Last Modified: 24 Jun 2025AICAS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: With the growing interest in Conversational AI, a system that enables machines to engage in human-like dialogues, there has been an increased focus on Automatic Speech Recognition (ASR) as an essential component of Conversational AI. Despite ongoing research, ASR performance still falls short in real-life applications such as academic lectures with technical terms. This paper proposes methods to enhance the recognition of technical terms frequently used in academic lectures, thereby improving overall ASR performance. The proposed method is an improvement on the method of analyzing the ratio between the frequency of words extracted by Optical Character Recognition (OCR) and the frequency of common words to accurately recognize technical terms. It was made based on the Power law, which is widely used in the scientific community. The experimental result showed that the reduction of the Word Error Rate (WER) up to 3.22% from the 108 hours of ‘Advanced Compiler’ lecture is achieved.
Loading