Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction

Elisa Bassignana; Filip Ginter; Sampo Pyysalo; Rob van der Goot; Barbara Plank

Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction

Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, Barbara Plank

Published: 20 Mar 2023, Last Modified: 04 May 2025NoDaLiDa 2023Readers: Everyone

Keywords: Relation Extraction, dataset, multi-lingual, multi-domain

TL;DR: A new dataset for multi-lingual and multi-domain Relation Extraction.

Abstract: Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources. We propose Multi-CrossRE, the broadest multi-lingual dataset for RE, including 26 languages in addition to English, and covering six text domains. Multi-CrossRE is a machine translated version of CrossRE (Bassignana and Plank, 2022), with a sub-portion including more than 200 sentences in seven diverse languages checked by native speakers. We run a baseline model over the 26 new datasets and--as sanity check--over the 26 back-translations to English. Results on the back-translated data are consistent with the ones on the original English CrossRE, indicating high quality of the translation and the resulting dataset.

Student Paper: Yes, the first author is a student

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/multi-crossre-a-multi-lingual-multi-domain/code)

4 Replies

Loading