OpenHuEval: Evaluating Large Language Model on Hungarian Specifics

ACL ARR 2025 February Submission6821 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: We introduce OpenHuEval, the first benchmark designed for comprehensive evaluation of large language models (LLMs) in the context of the Hungarian language and specifics. OpenHuEval incorporates the latest design principles for evaluating LLMs, such as using real user queries from the internet, emphasizing the assessment of LLMs' generative capabilities, and employing LLM-as-judge to enhance the multidimensionality and accuracy of evaluations. We evaluated current mainstream LLMs, including both traditional LLMs and recently developed Large Reasoning Models. The results demonstrate a significant necessity for evaluation and model optimization tailored to the Hungarian language and specifics. We also conducted a detailed analysis of the reasoning process of LRMs on OpenHuEval, revealing the intrinsic patterns and mechanisms of these models in non-English languages, with Hungarian serving as a representative example. We will release OpenHuEval on GitHub.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Resources and Evaluation
Contribution Types: Data resources, Data analysis
Languages Studied: Hungarian,English
Submission Number: 6821
Loading