Early Guessing for Dialect IdentificationDownload PDF

Anonymous

16 Jan 2022 (modified: 05 May 2023)ACL ARR 2022 January Blind SubmissionReaders: Everyone
Abstract: This paper deals with the problem of incremental dialect identification. Our goal is to reliably determine the dialect before the full utterance is given as input. The major part of the previous research on dialect identification has been model-centric with a focus on performance. We address a new question: How much input is needed to identify a dialect? Our approach is a data-centric analysis that results in general criteria for finding the shortest input needed to make a plausible guess. Working with two sets of dialects (Swiss German and Indo-Aryan languages), we show that the dialect can be identified well before the end of the input utterance. To determine the optimal point for making the first guess, we propose a heuristic that involves calibrated model confidence (temperature scaling) and input length. We show that the same input shortening criteria apply to both of our data sets. While the performance with the early guesses is still below the performance on the full input, the gap is smaller when the overall performance of the fine-tuned model is better.
Paper Type: short
0 Replies

Loading