Can Large Langauge Model Comprehend Ancient Chinese? A Preliminary Test on ACLUE

Published: 01 Jan 2023, Last Modified: 25 Jan 2025ALP@RANLP 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Large language models (LLMs) have demonstrated exceptional language understanding and generation capabilities. However, their ability to comprehend ancient languages, specifically ancient Chinese, remains largely unexplored. To bridge this gap, we introduce ACLUE, an evaluation benchmark designed to assess the language abilities of models in relation to ancient Chinese. ACLUE consists of 15 tasks that cover a range of skills, including phonetic, lexical, syntactic, semantic, inference and knowledge. By evaluating 8 state-of-the-art multilingual and Chinese LLMs, we have observed a significant divergence in their performance between modern Chinese and ancient Chinese. Among the evaluated models, ChatGLM2 demonstrates the highest level of performance, achieving an average accuracy of 37.45%. We have established a leaderboard for communities to assess their models.
Loading