Keywords: knowledge base question answering, multilingual question answering, machine reading comprehension, evaluation resources, Russian language resources
Abstract: We describe the second version of RuBQ, a Russian dataset for knowledge base question answering (KBQA) over Wikidata. Whereas the first version builds on Q&A pairs harvested online, the extension is based on questions obtained through search engine query suggestion services. The questions underwent crowdsourced and in-house annotation in a quite different fashion compared to the first edition. The freely available RuBQ 2.0 contains 2,910 questions along with answers and SPARQL queries, i.e. the dataset doubled in size. The dataset also incorporates answer-bearing paragraphs from Wikipedia for majority of questions. RuBQ 2.0 is suitable for evaluation of KBQA, machine reading comprehension (MRC), and hybrid questions answering models, as well as for semantic parsing research. We provide analysis of the dataset and report several KBQA and MRC baseline results.
First Author Is Student: Yes