UKoSpeech: A Universal Korean ASR System for Diverse Domains

ACL ARR 2024 June Submission3610 Authors

16 Jun 2024 (modified: 22 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The rapid advancement of Automatic Speech Recognition (ASR) systems has dramatically transformed transcription processes, minimizing the need for expert human intervention. Despite the growth in ASR technologies and the emergence of robust models like Whisper, significant challenges remain. Specifically, the scarcity of non-English training data and poor adaptability to domain-specific contexts hinder broader application. This paper introduces UKoSpeech, a novel Korean ASR system designed to address these issues through a unique two-pronged approach: a Korean data curation pipeline leveraging domain-specific data from sources such as YouTube subtitles, and a domain-specific training framework that utilizes a domain prompt technique for enhanced adaptability. Our results indicate that UKoSpeech not only fills the gap in multilingual ASR research but also provides superior domain-specific performance compared to established ASR systems like Whisper, Google STT, and CLOVA Speech. Through extensive evaluation across diverse domains such as finance, medicine, and law, UKoSpeech demonstrates state-of-the-art performance, establishing a new benchmark for domain-adaptable ASR systems.
Paper Type: Long
Research Area: Speech Recognition, Text-to-Speech and Spoken Language Understanding
Research Area Keywords: domain adaptation, automatic speech recognition, less-resourced languages
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study, Approaches to low-resource settings
Languages Studied: English, Korean
Submission Number: 3610
Loading