MELA: Multilingual Evaluation of Linguistic AcceptabilityDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
TL;DR: We present the first multilingual linguistic acceptability benchmark, and showcase some of its potential usages, including but not limited to evaluating LLMs and serving as a testbed for cross-lingual transfer experiments.
Abstract: In this work, we present the largest benchmark to date on linguistic acceptability: Multilingual Evaluation of Linguistic Acceptability---MELA, with 48K samples covering 10 languages from a diverse set of language families. We establish LLM baselines on this benchmark, and investigate cross-lingual transfer in acceptability judgements with XLM-R. In pursuit of multilingual interpretability, we analyze the weights of fine-tuned XLM-R to explore the possibility of identifying transfer difficulty between languages. Our results show that GPT-4 performs on par with fine-tuned XLM-R, while open-source instruction-finetuned multilingual models lags behind by a notable gap. Cross-lingual and multi-task learning experiments show that unlike semantic tasks, in-language training data is crucial in acceptability judgements. We also conduct edge probing to investigate the different syntax capacities between base XLM-R and MELA-finetuned XLM-R. Results of probing indicate that training on MELA improves the performance of XLM-R on sytax-related probing tasks. Our dataset will be made publicly available upon acceptance.
Paper Type: long
Research Area: Resources and Evaluation
Contribution Types: Data resources
Languages Studied: English, Chinese, Italian, Russian, German, French, Spanish, Japanese, Arabic, Icelandic
0 Replies

Loading