Efficient Amharic Audio Data Search Engine using Text-Based Spoken Term Detection with Models

Efficient Amharic Audio Data Search Engine using Text-Based Spoken Term Detection with Models

ICLR 2024 Workshop GenAI4DM Submission65 Authors

09 Feb 2024 (modified: 10 Feb 2024)ICLR 2024 Workshop GenAI4DM Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: ASR (Automatic Speech Recognition), STD (Spo- ken Term Detection), WER (Word Error Rate), SER (Search Error Rate)

TL;DR: Efficient Amharic Audio Data Search Engine

Abstract: The generation of audio files from various sources, including the internet and social media, has increased significantly in the rapidly expanding digital landscape. It is difficult to ef- ficiently access specific spoken words from this vast collec- tion of Amharic audio data. To address this, we propose a novel method that combines Text-Based Spoken Term Detec- tion (STD) with models. Our methodology includes speech seg- mentation with pydub, the development of an ASR model, and the implementation of keyword-based STD. The ASR model successfully transcribes audio files, allowing meaningful key- words to be extracted for more accurate and frequent search queries. An analysis of 37 audio files reveals that the sentence error rate (SER) is 91.7 percent (33 of 36 sentences have er- rors) and the word error rate (WER) is 98.3 percent (285 of 290 words have errors). It improved search accuracy and effi- ciency for specific spoken terms, significantly improving search capabilities for users of Amharic multimedia resources. How- ever, the study emphasizes the need for a larger dataset to improve transcription capabilities and reduce errors, with the potential to revolutionize Amharic audio search engines and empower users in accessing precise information from Amharic audio data, ultimately transforming how we interact with and use Amharic audio resources.

Submission Number: 65

Loading