TL;DR: This paper presents XNLIeu, an extension of the XNLI dataset for the Basque low-resource language, as well as a series of baselines for Basque cross-lingual NLI.
Abstract: The XNLI dataset, a benchmark for Natural Language Inference (NLI), is extensively used to assess cross-lingual Natural Language Understanding (NLU) capabilities across various languages. In this paper, we extend XNLI to include Basque, a low-resource language that can greatly benefit from transfer-learning approaches. \textrm{XNLIeu} has been developed by first machine-translating the English XNLI corpus to Basque, followed by a manual post-edition step. We conduct a series of experiments using mono- and multilingual LLMs to assess a) the effect of professional post-edition compared to the automatic MT system b) the best cross-lingual strategy for NLI in Basque and c) whether the choice of the best cross-lingual strategy is influenced by the fact that the dataset is built by translation. The results show that post-edition is crucial and that the translate-train cross-lingual strategy obtains better results overall, although the gain is lower when tested in a dataset that has been built natively from scratch.
Paper Type: long
Research Area: Semantics: Sentence-level Semantics, Textual Inference and Other areas
Contribution Types: Data resources, Data analysis
Languages Studied: Basque
0 Replies
Loading