Code-Switching Detection in Multilingual Child Speech with SwissBERT

23 Mar 2026 (modified: 19 May 2026)SwissText 2026 Conference SubmissionEveryoneRevisionsCC BY 4.0
Track: Scientific Track
Keywords: Word-level classification, Low-resource languages, Multilingual NLP, Subword-label alignment
TL;DR: This paper introduces a supervised word‑level system for detecting code‑switching in multilingual child speech using SwissBERT.
Abstract: Code-switching is widespread in multilingual speech, yet its automatic detection remains challenging, especially for low-resource languages. In Switzerland, a context with multiple languages and Swiss German varieties, these challenges are amplified by variable orthography and limited annotated data. We present a supervised word-level language-identification system for code-switching detection in multilingual everyday child and adult speech, obtained by fine-tuning SwissBERT. We constructed a dataset of four languages and an other category, implemented controlled subword–label alignment, and evaluated performance using token-level F1. To contextualize SwissBERT’s performance, we additionally fine-tuned mBERT as a multilingual baseline. SwissBERT achieves robust word-level predictions and outperforms mBERT. We release the full training pipeline and evaluation scripts to facilitate reproducibility.
Submission Number: 41
Loading