Hidden Vulnerabilities: The Knowledge Degradation in Fine-Tuned Large Language Models

Hidden Vulnerabilities: The Knowledge Degradation in Fine-Tuned Large Language Models

ACL ARR 2024 June Submission2214 Authors

15 Jun 2024 (modified: 02 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: As real-world applications often require further fine-tuning for better downstream performance, we investigate the impact of such instruction fine-tuning on the general performance of large language models (LLMs). Using standard LLM benchmarks, we observe significant degradation for tasks requiring more complex and compositional skills, as represented by BBH benchmarks. On the other hand, model's general capability for Knowledge Retrieval, as indicated by MMLU scores across various domains seems to be relatively stable. Our finding sheds light on general degradation in model performance which is not confined to a specific domain but is more closely related to the type of capability involved where in this paper we benchmark two of them: Knowledge Retrieval and Knowledge Reasoning. Furthermore, we examine how fine-tuning training data impacts performance by comparing the effects of knowledge-compatible data training versus knowledge-conflict data training across different benchmark datasets.

Paper Type: Short

Research Area: Resources and Evaluation

Research Area Keywords: Fine-tuning,Robustness,Generalization

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data analysis

Languages Studied: English

Submission Number: 2214

Loading