Large Language Model is not a (Multilingual) Compositional Relation Reasoner

Jinman Zhao; Xueyan Zhang

Large Language Model is not a (Multilingual) Compositional Relation Reasoner

Jinman Zhao, Xueyan Zhang

Published: 10 Jul 2024, Last Modified: 26 Aug 2024COLMEveryoneRevisionsBibTeXCC BY 4.0

Research Area: Evaluation

Keywords: Benchmarks, Composition Relation

TL;DR: We estabilish a benchmark for evaluating LLMs' composition relation reasoning

Abstract: We present a comprehensive evaluation of large language models' capability to reason compositional relations through a benchmark encompassing 1,800 test cases in both English and Chinese, covering six distinct categories of composition relations: Positional, Comparative, Personal, Mathematical, Identity, and Other. We expand our assessment to the multilingual realm by including translations of the benchmark suite into Japanese, French, and Korean. Our Multilingual Composition Relation (MCR) benchmark aims at investigating the robustness and adaptability of LLMs in handling compositional relation reasoning across diverse linguistic contexts.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html

Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html

Submission Number: 407

Loading