Understanding QA Generation: Extracting Parametric and Contextual Knowledge with CQA for Low-Resource Bangla Language
Abstract: Question-Answering (QA) models for low-resource languages like Bangla face challenges due to limited annotated data and linguistic complexity. A key issue is determining whether models rely more on pre-encoded (parametric) knowledge or contextual input during answer generation as existing Bangla QA datasets lack the structure required for such analysis. We introduce BanglaCQA, the first Counterfactual QA dataset in Bangla by integrating counterfactual passages and answerability annotations into an existing dataset. In addition, we propose prompting-based pipelines for LLMs to disentangle parametric and contextual knowledge in both factual and counterfactual scenarios. Furthermore, we apply LLM-based evaluation techniques that measure answer quality based on semantic similarity. Our work not only introduces a novel framework for analyzing knowledge sources in Bangla QA but also uncovers critical findings that open up broader directions for counterfactual reasoning in low-resource language settings.
Paper Type: Short
Research Area: Question Answering
Research Area Keywords: Question Answering, Resources and Evaluations, Language Modeling
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Data resources
Languages Studied: Bangla
Submission Number: 2650
Loading