Abstract: Home assistants are essential today, but they typically support only popular languages. Promoting products that enhance underrepresented languages is crucial for preservation. Using a home assistant in one’s native language, such as Catalan, is a significant step toward this goal. Keyword spotting (KWS) and speech recognition are two potential solutions. The lightweight architecture of KWS models is promising for low-powered edge devices in domotic environments. However, there is a lack of resources to train such models, especially for Catalan. This paper presents a solution using forced alignment techniques with speech-to-text models to extract any set of words from any speech resource. While our focus is on Catalan, this methodology can be applied to other languages.
Loading