Amharic Language Audio Data Search Engine

NAACL 2025 Workshop AISD Submission22 Authors

06 Feb 2025 (modified: 07 Feb 2025)NAACL 2025 Workshop AISD Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Track: Archival regular
Keywords: ASR (Automatic Speech Recognition), STD (Spoken Term Detection), WER (Word Error Rate), SER (Search Error Rate)
Abstract: The generation of audio files from various sources, including the internet and social media, has increased significantly in the rapidly expanding digital landscape. It is difficult to efficiently access specific spoken words from this vast collection of Amharic audio data. To address this, we propose a novel method that combines Text-Based Spoken Term Detection (STD) with models. Our methodology includes speech segmentation with pydub, the development of an ASR model, and the implementation of keyword-based STD. The ASR model successfully transcribes audio files, allowing meaningful keywords to be extracted for more accurate and frequent search queries. An analysis of 37 audio files reveals that the sentence error rate (SER) is 91.7 percent (33 of 36 sentences have errors) and the word error rate (WER) is 98.3 percent (285 of 290 words have errors). It improved search accuracy and efficiency for specific spoken terms, significantly improving search capabilities for users of Amharic multimedia resources. However, the study emphasizes the need for a larger dataset to improve transcription capabilities and reduce errors, with the potential to revolutionize Amharic audio search engines and empower users in accessing precise information from Amharic audio data, ultimately transforming how we interact with and use Amharic audio resources.
Submission Number: 22
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview