SUDANESE ARABIC DIALECT ENCODING USING XLM-RoBERTa LANGUAGE MODEL: Zol-ROBERTADownload PDF

01 Mar 2023 (modified: 11 Apr 2023)Submitted to Tiny Papers @ ICLR 2023Readers: Everyone
Keywords: Sudanese Arabic Dialect, Modern Standard Arabic, XLM-RoBERTa, Zol-RoBERTa, Sentiment Anakysis, NLU
TL;DR: Language Model for Sudanese Arabic Dialect
Abstract: XLM-RoBERTa has proven to be very efficient at Natural Language Understanding (NLU), as it allows to achieve state-of-the-art results in most NLU tasks. In this work we aim to utilize the power of XLM-RoBERTa in Sudanese Arabic dialect. We collected over 6 million sentences in Sudanese dialect and used them to resume training of the pre-trained XLM-RoBERTa, as it was trained on 2.5T of data across 100 languages filtered from Common Crawl. Our model -Zol-RoBERTa- is expected to achieve better performance on Sudanese Sentiment Analysis, this clarifies that Zol-RoBERTa will work better in understanding Sudanese Dialectic, which is the domain we are targeting.
4 Replies

Loading