Keywords: ASR (Automatic Speech Recognition), STD (Spo- ken Term Detection), WER (Word Error Rate), SER (Search Error Rate)
TL;DR: Efficient Amharic Audio Data Search Engine
Abstract: The generation of audio files from various sources, including
the internet and social media, has increased significantly in
the rapidly expanding digital landscape. It is difficult to ef-
ficiently access specific spoken words from this vast collec-
tion of Amharic audio data. To address this, we propose a
novel method that combines Text-Based Spoken Term Detec-
tion (STD) with models. Our methodology includes speech seg-
mentation with pydub, the development of an ASR model, and
the implementation of keyword-based STD. The ASR model
successfully transcribes audio files, allowing meaningful key-
words to be extracted for more accurate and frequent search
queries. An analysis of 37 audio files reveals that the sentence
error rate (SER) is 91.7 percent (33 of 36 sentences have er-
rors) and the word error rate (WER) is 98.3 percent (285 of
290 words have errors). It improved search accuracy and effi-
ciency for specific spoken terms, significantly improving search
capabilities for users of Amharic multimedia resources. How-
ever, the study emphasizes the need for a larger dataset to
improve transcription capabilities and reduce errors, with the
potential to revolutionize Amharic audio search engines and
empower users in accessing precise information from Amharic
audio data, ultimately transforming how we interact with and
use Amharic audio resources.
Submission Number: 65
Loading