Automatic Detection of Direct and Self Repetitions in Naturalistic Speech Recordings of French- and Dutch-Speaking Autistic Children
Abstract: This study investigates the use of cosine similarity measures across lexical, syntactic, and semantic vectors to detect direct and self-repetitions in the spontaneous speech of autistic children. In the creation of lexical and syntactic vectors, we use spaCy for lemmatization and POS tagging, and the semantic vectors were constructed with Sentence Transformers (BERT).
Using extensively annotated datasets of French and Dutch autistic children's speech, we computed and evaluated thresholds obtained from the similarities between the vectors to distinguish between repetitive and non-repetitive utterances.
The results show that semantic and lexical similarity provide reliable cues for identifying self-repetitions, and achieving high precision and recall scores. However, direct repetitions are more challenging to detect. Overall, the best models for the detection of both types of repetition are based on lexical and semantic similarities. By contrast, models based on syntactic similarity perform worse in all conditions.
Further research is needed to refine models for direct repetitions and explore their cross-linguistic applicability.
Paper Type: Long
Research Area: Discourse and Pragmatics
Research Area Keywords: discourse relations, conversation, communication
Contribution Types: Model analysis & interpretability, Data analysis
Languages Studied: French, Dutch
Submission Number: 1529
Loading