AraT5GQA: Arabic Question Answering model using automatic generated dataset

Ahmad M Mustafa

Published: 22 Aug 2024, Last Modified: 30 Sept 20242024 15th International Conference on Information and Communication Systems (ICICS)EveryoneCC BY 4.0

Abstract: Arabic question answering models have gained significant importance in recent years, particularly for institutions and enterprises that require customer service, search engines, or other applications capable of handling inquiries related to private data. These entities often prefer to train their own models to maintain the privacy of their private or protected information.In this study, we introduce an innovative methodology involving the training of an AraT5 large language model. This model is initially fine-tuned using a benchmark dataset, followed by training with an automatically generated dataset. The results demonstrate that the model trained on the generated dataset achieves a commendable F1-score of 0.77. This performance is notably close to the 0.88 F1-score obtained by the model trained exclusively on the benchmark dataset, highlighting the effectiveness and potential of our proposed approach.