Knowledge Management Framework Over Low Resource Indian Colloquial Language Audio Contents

Published: 01 Jan 2024, Last Modified: 05 Mar 2025COMAD/CODS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We introduce Graama-Kannada Audio Search, a knowledge management framework for low-resource Indian colloquial language audio content. Rural social communities often contain rich knowledge, but it can often be inaccessible on the internet as it is primarily oral. Organizations such as Namma Halli Radio1 have collected an audio corpus of a few hours containing community interactions spoken in colloquial language. A simple and efficient platform to store, organize, and query knowledge in such audio can greatly help even for people with low literacy levels. In this work, we fine-tune state-of-the-art Automatic Speech Recognition (ASR) models using limited audio data to reduce the Word Error Rate (WER) for colloquial audio data to acquire transcripts for the audio, followed by creating an interface to search for keywords using simple fuzzy matching technique for n-gram inputs.
Loading