Research Area: Evaluation
Keywords: Benchmarks, Composition Relation
TL;DR: We estabilish a benchmark for evaluating LLMs' composition relation reasoning
Abstract: We present a comprehensive evaluation of large language models'
capability to reason compositional relations through
a benchmark encompassing 1,800 test cases in both English and Chinese,
covering six distinct categories of composition relations:
Positional, Comparative, Personal, Mathematical, Identity, and Other.
We expand our assessment to the multilingual realm by including translations of the benchmark suite into
Japanese, French, and Korean.
Our Multilingual Composition Relation (MCR) benchmark
aims at investigating the robustness and adaptability of LLMs in handling compositional relation reasoning across diverse linguistic contexts.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Submission Number: 407
Loading