BCoQA: Benchmark and Resources for Bangla Context-based Conversational Question Answering

ACL ARR 2024 June Submission4200 Authors

16 Jun 2024 (modified: 13 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Developing a Bangla Context-based Conversational Question Answering (CCQA) system presents unique challenges, including limited domain-specific data, inadequate translation methods, and a lack of pretrained language models. In this work, we address these obstacles by constructing a robust Bangla CCQA dataset through quality controlled machine translation and LLM based augmentation of established English CCQA datasets, followed by partitioning into training, validation and test splits. We finetune and then evaluate the performance of various existing sequence-to-sequence models using the train and test split respectively, by appending conversation history into the input prompt to preserve context. The entire dataset and the testing script have been made publicly available on GitHub for benchmarking future models. This initiative marks a significant step in advancing conversational AI for Bangla, setting a foundation for further research and development in the field.
Paper Type: Short
Research Area: Resources and Evaluation
Research Area Keywords: Resources and Evaluation, Question Answering
Contribution Types: Approaches to low-resource settings, Data resources
Languages Studied: Bengali, English
Submission Number: 4200
Loading