MultiMUC: Multilingual Template Filling on MUC-4Download PDF

Anonymous

16 Oct 2023ACL ARR 2023 October Blind SubmissionReaders: Everyone
Abstract: We introduce MultiMUC, the first multilingual parallel corpus for template filling, comprising translations of the classic MUC-4 template filling benchmark into five languages: Arabic, Chinese, Farsi, Korean, and Russian. We obtain automatic translations from a strong multilingual machine translation system and manually project the original English annotations into each target language. For all languages, we also provide human translations for key portions of the dev and test splits. Finally, we present baselines on MultiMUC both with state-of-the-art template filling models for MUC-4 and with ChatGPT. We release MUC-4 and the supervised baselines to facilitate further work on document-level information extraction in multilingual settings.
Paper Type: long
Research Area: Information Extraction
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: Arabic,Chinese,English,Farsi,Korean,Russian
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies

Loading