Open source platform for Estonian speech transcription

Aivo Olev, Tanel Alumäe

Published: 2025, Last Modified: 26 May 2026Lang. Resour. Evaluation 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper presents our progress in developing and maintaining a public speech and speaker recognition platform for the Estonian language. The platform consists of a speech processing pipeline and a web-based user interface for end-users, offering transcript post-editing functionality. It is offered for free as a public service and is in active use. The service provides significantly higher speech recognition accuracy than commercial alternatives. We discuss the switch to a workflow management system and how it has improved the core speech processing pipeline. The core systems behind the platform have been made available as open-source code and deployed internally by multiple public and private institutions.
Loading