Track: Archival regular
Keywords: ASR (Automatic Speech Recognition), STD (Spoken Term Detection), WER (Word Error Rate), SER (Search Error Rate)
Abstract: The generation of audio files from various sources, including
the internet and social media, has increased significantly in the rapidly
expanding digital landscape. It is difficult to efficiently access specific
spoken words from this vast collection of Amharic audio data. To address
this, we propose a novel method that combines Text-Based Spoken Term
Detection (STD) with models. Our methodology includes speech segmentation with pydub, the development of an ASR model, and the implementation of keyword-based STD. The ASR model successfully transcribes
audio files, allowing meaningful keywords to be extracted for more accurate and frequent search queries. An analysis of 37 audio files reveals
that the sentence error rate (SER) is 91.7 percent (33 of 36 sentences
have errors) and the word error rate (WER) is 98.3 percent (285 of 290
words have errors). It improved search accuracy and efficiency for specific spoken terms, significantly improving search capabilities for users of
Amharic multimedia resources. However, the study emphasizes the need
for a larger dataset to improve transcription capabilities and reduce errors, with the potential to revolutionize Amharic audio search engines
and empower users in accessing precise information from Amharic audio
data, ultimately transforming how we interact with and use Amharic
audio resources.
Submission Number: 22
Loading