Building efficient and effective OpenQA systems for low-resource languages

Published: 01 Jan 2024, Last Modified: 03 Oct 2024Knowl. Based Syst. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•We show OpenQA is feasible in low-resource language contexts without gold labels for training.•Key enabler of OpenQA in low-resource languages: weak supervision and unstructured data.•We demonstrate that only a few hundred gold examples suffice to evaluate OpenQA.•Growing knowledge sources impact OpenQA results based on retrievers’ noise handling capability.•We release SQuAD-TR, a large scale Turkish QA dataset derived from SQuAD2.0.
Loading