AI-based Personalized Feedback in Speech Therapy for People with Aphasia

University of Eastern Finland DRDHum 2024 Conference Submission16 Authors

Published: 03 Jun 2024, Last Modified: 11 Aug 2024DRDHum 2024 withRevisionsEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Aphasia, automatic speech recognition, speech and language therapy, digital health
TL;DR: Completed work on how to build a pipeline for spech therapy app including ASR and further text analysis.
Abstract: In the field of speech and language therapy, artificial intelligence has been used in di-agnostics, therapy, and assistive systems for people with aphasia (PWA) (Adikari et al., 2023; Azevedo et al., 2024; Pottinger & Kearns, 2024). AphaDIGITAL project (TDG, 2021) focuses on developing such a mobile application for German-speaking PWA that will provide personalized multilevel feedback with the help of Automatic Speech Recognition (ASR) and further text analysis. To build the corresponding pipe-line (Rykova & Walther, 2024), the following questions are addressed. Which existing ASR solutions are suitable for the task-specific speech of Ger-man-speaking PWA? More than 50 open-source ASR solutions were evaluated with the help of several speech recordings from different corpora (Rykova, Walther, & Zeuner, 2022). Thirteen models were selected and tested with atypical speech, including two small datasets of PWA’s speech (Rykova & Walther, in press - a). Based on Character Error Rate (CER), HITS (the number of precisely recognized words) and the number of empty outputs, four open-source ASR models were selected for the pipeline (Fleck, 2022; Grosman, 2022; Guhr, 2022; NVIDIA, 2022). These models are to a greater or lesser extent robust to speaker gender and age. The experiments suggest that for better sin-gle-word recognition the audio samples should be not too short and pronounced neither too slowly nor too fast (i.e. intentionally speeded up) (Rykova & Walther, in press - b). How can selected ASR solutions be improved and/or adapted for the purposes of speech and language therapy? In the absence of adequate data for ASR models’ (re)training, applying the knowledge about non-standard (aphasic and dialect) phonetic features post-hoc to ASR output was attempted. Aphasic features included recognition of syllables as separate words and vowel prolongation. Dialect features were selected from the Thuringia-Upper Saxon dialect group (Wallraff, 2007; Rocholl, 2015; B. Siebenhaar, personal communication, January, 2024). The method combined generating alternative pronunciations based on non-standard patterns (Masmoudi et al., 2014) and using alternatives for evaluation (Ali et al., 2017), and proved to work on the recordings of German aphasia test naming and repetition tasks (Huber, 1993). How can a combination of selected ASR solutions and existing tools for seman-tic and grammatical analysis serve for speech production errors analysis? If the answer of the speaker is not recognized as fully correct or containing phonet-ic/phonemic errors, it is subject to semantic analysis. It must be compared to the target in terms of their semantic relationship and distance. The current semantic analysis pipeline is built upon GermaNet – a semantic network for the German language (Hamp & Feldweg, 1997). It includes recognition of hyponymy/hypernymy, belonging to the same semantic (sub)category, and different lexical and conceptual relationships, de-rived from GermaNet. If the answer is not recognized as an existing word, a search for close orthographic matches is performed, and the match that is semantically the clos-est to the target is subject to the relationship analysis described above. This approach has been tested and described in detail in Rykova & Walther (2023).
Submission Number: 16
Loading