Abstract: Since the advent of the GPT-3.5 model, numerous large language models(LLMs) have emerged in China. With the increasing number of users, the security of these models has garnered extensive attention from researchers. However, the current evaluation samples lack comprehensiveness and diversity, and the evaluation is primarily conducted in a single language. In this paper, we introduced the SafeLLMs benchmark as a method to assess the security of models. Creative innovation was employed as we proposed seven evaluation dimensions and compiled a dataset consisting of 19,763 evaluation samples to conduct a comprehensive assessment of LLMs. Under both zero-shot and three-shot settings, in both Chinese and English, we conducted evaluations on 16 models. Furthermore, we analyzed the disparities in security performance among different models when processing the two languages. We found ample room for improvement in dimensions such as Bias and Discrimination. We posit that the findings from this evaluation experiment hold the potential to enhance the security performance of LLMs. Our evaluation dataset and results will soon be showcased.
Loading