One Question Answering Model for Many Languages with Cross-lingual Dense Passage RetrievalDownload PDF

May 21, 2021 (edited Nov 09, 2021)NeurIPS 2021 PosterReaders: Everyone
  • Keywords: Question answering, Multilingual NLP, Open-domain QA, Cross-lingual Information Retrieval
  • TL;DR: We present CORA, a Cross-lingual Open-Retrieval Answer Generation model that can answer questions across many languages even when language-specific annotated data or knowledge sources are unavailable.
  • Abstract: We present Cross-lingual Open-Retrieval Answer Generation (CORA), the first unified many-to-many question answering (QA) model that can answer questions across many languages, even for ones without language-specific annotated data or knowledge sources. We introduce a new dense passage retrieval algorithm that is trained to retrieve documents across languages for a question. Combined with a multilingual autoregressive generation model, CORA answers directly in the target language without any translation or in-language retrieval modules as used in prior work. We propose an iterative training method that automatically extends annotated data available only in high-resource languages to low-resource ones. Our results show that CORA substantially outperforms the previous state of the art on multilingual open QA benchmarks across 26 languages, 9 of which are unseen during training. Our analyses show the significance of cross-lingual retrieval and generation in many languages, particularly under low-resource settings.
  • Supplementary Material: pdf
  • Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
  • Code: https://github.com/AkariAsai/CORA
14 Replies

Loading