GiellaLT---a stable infrastructure for Nordic minority languages and beyondDownload PDF

Published: 20 Mar 2023, Last Modified: 17 Apr 2023NoDaLiDa 2023Readers: Everyone
Keywords: Infrastructure, Rule-Based, Spell-checking, Grammar-checking, Morphology, Corpora, TTS
TL;DR: We introduce a long-term language technology infrastructure that has been used for Nordic languages and more for over 20 years.
Abstract: Long term language technology infrastructures are critical for continued maintenance of language technology based software that is used to support the use of languages in digital world. In Nordic area we have languages ranging from well-resourced national majority languages like Norwegian, Swedish and Finnish as well as minoritised, unresourced and indigenous languages like Sámi languages. We present an infrastructure that has been build in over 20 years time that supports building language technology and tools for most of the Nordic languages as well as many of the languages all over the world, with focus on Sámi and other indigenous, minoritised and unresourced languages. We show that one common infrastructure can be used to build tools from keyboards and spell-checkers to machine translators, grammar checkers and text-to-speech as well as automatic speech recognition.
3 Replies

Loading