Large Language Model is not a (Multilingual) Compositional Relation Reasoner

Published: 10 Jul 2024, Last Modified: 26 Aug 2024COLMEveryoneRevisionsBibTeXCC BY 4.0
Research Area: Evaluation
Keywords: Benchmarks, Composition Relation
TL;DR: We estabilish a benchmark for evaluating LLMs' composition relation reasoning
Abstract: We present a comprehensive evaluation of large language models' capability to reason compositional relations through a benchmark encompassing 1,800 test cases in both English and Chinese, covering six distinct categories of composition relations: Positional, Comparative, Personal, Mathematical, Identity, and Other. We expand our assessment to the multilingual realm by including translations of the benchmark suite into Japanese, French, and Korean. Our Multilingual Composition Relation (MCR) benchmark aims at investigating the robustness and adaptability of LLMs in handling compositional relation reasoning across diverse linguistic contexts.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Submission Number: 407
Loading