A Generative Adversarial Network for Data Augmentation: The Case of Arabic Regional Dialects

Published: 24 Jul 2021, Last Modified: 24 Apr 2026OpenReview Archive Direct UploadEveryoneCC BY 4.0
Abstract: Text Generation using Generative Adversarial Networks (GANs) has been successful in domains such as sentiment analysis using Sentimental GAN (SentiGAN) model. We adopt a similar model to generate sentences for five regional Arabic dialects (Egypt, Gulf, Maghreb, Levant, and Iraq). The objective is to overcome the scarcity of richly annotated Dialectal Arabic (DA) datasets by automatic generation of such corpora. The DA generation process for a specific dialect, relies on a generator to create new text, and a discriminator to evaluate that text, with a dynamic update that will allow the process to run automatically without supervision. Novelty and diversity are the two metrics used to verify the consistency and quality of the generated DA text before enriching the sought datasets. Experimental results confirm the reliability and value of the generated datasets when tested by different classifiers.
Loading