Abstract: Large Language Models (LLMs), exemplified by the likes of ChatGPT, have marked significant strides in the field of Natural Language Processing, earning widespread acclaim for their multitasking prowess. However, as the demand for cross-lingual applications escalates, the issue of response consistency in different linguistic contexts within LLMs becomes increasingly apparent, particularly in terms of knowledge-based queries. This study is committed to a profound evaluation of cross-lingual consistency in the knowledge embedded within LLMs. Existing research on knowledge-based cross-lingual consistency is notably scarce and suffers from conspicuous limitations. To address these shortcomings, we have constructed a factual knowledge dataset based on Wikidata, spanning five domains and twelve languages. Furthermore, we propose a novel set of metrics for evaluating cross-lingual consistency of knowledge, incorporating cross-lingual semantic consistency, cross-lingual accuracy consistency, and cross-lingual timeliness consistency. Leveraging this newly constructed dataset and evaluation metrics, we have undertaken a comprehensive evaluation and analysis of six representative open-source and closed-source models. The source code will be made publicly available for further research.
Paper Type: long
Research Area: Multilinguality and Language Diversity
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English, French, Spanish, Chinese, Russian, Japanese, Italian, German, Portuguese, Korean, Greek, Dutch
0 Replies
Loading