Automatic Classification of Parental Behaviors in Bilingual Datasets from In-Person and Telehealth Language Assessment

Automatic Classification of Parental Behaviors in Bilingual Datasets from In-Person and Telehealth Language Assessment

ACL ARR 2025 February Submission6527 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Conducting text-based behavioral coding is a labor-intensive process for clinicians, particularly when annotating complex bilingual data. This study evaluates the performance of four state-of-the-art (SOTA) large language models (LLMs) in automating the classification of parent behaviors within a bilingual dataset comprising 59 Mandarin-English child language assessment sessions (16 in-person and 43 telehealth). While the four LLMs — GPT-4, Llama-3, Qwen2, and DeepSeek-V3 — achieved notable accuracy, they still fall short of the performance of bilingual human annotators. Additional error analysis revealed that both human annotators and the generally best-performing model, GPT-4, faced challenges in classifying parental behaviors in categories involving complex task procedures, especially when analyzing bilingual code-mixed text. This study contributes to the understanding of how LLMs can be utilized to advance the automated classification of behavioral coding in bilingual child language assessments.

Paper Type: Long

Research Area: Multilingualism and Cross-Lingual NLP

Research Area Keywords: multilingual benchmarks, multilingualism, multilingual evaluation

Contribution Types: Model analysis & interpretability, Data analysis

Languages Studied: English, Mandarin

Submission Number: 6527

Loading