Abstract: In morphologically complex languages like Arabic, developing a morphophonological processing system poses significant challenges. While deep learning models have shown success in this task, these models heavily rely on the size of the annotated data. However, creating large datasets, especially for low-resource languages such as different Arabic dialects, is very time-consuming, hard, and expensive. Furthermore, not all annotated data contribute beneficial information for training models. To address these issues, active learning tries to guide the learning algorithm to choose informative samples for annotation. Despite the limited research on applying active learning to morphophonological processing, this paper introduces a novel combination of meta and active learning approaches for tackling this task. To the best of our knowledge, there is no research that focuses on the combination of these approaches. The experimental results conducted on Egyptian Arabic demonstrate that achieving similar performance as the state-of-the-art model on the entire dataset is possible with only approximately 23% of annotated data. Notably, our proposed method outperforms existing successful deep active learning methods.
Paper Type: short
Research Area: Phonology, Morphology and Word Segmentation
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: Arabic
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading