Research on Security Assessment and Safety Hazards Optimization of Large Language Models

15 Aug 2024 (modified: 21 Aug 2024)IEEE ICIST 2024 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: $$This study investigated the performance of mainstream large language models in Chinese security generation tasks, explored the possible security risks of large language models and proposed improvement strategies. The Multidimensional Security Question Answering (MSQA) dataset and Multidimensional Security Scoring Criteria (MSSC) were developed, and the performance of three models in 6 security tasks was compared. Pearson correlation analysis was performed using GPT-4 and questionnaires, and automatic scoring was achieved based on GPT-3.5-Turbe and Llama-3. Experimental results show that ERNIE Bot performs well in ideology and ethics evaluation, ChatGPT performs well in rumors and false information and privacy security evaluation, and Claude performs well in factual fallacies and social bias evaluation. The fine-tuned model performs well in security scoring tasks, and the proposed Security Tips Expert (ST-GPT) can effectively reduce security hazard. All models have security risks. It is recommended that both domestic and foreign models should comply with the legal framework of their respective countries, reduce AI hallucinations, continuously expand the corpus, and perform corresponding updates and iterations.$$
Submission Number: 176
Loading