Abstract: With the continuous development of language models and the widespread availability of various types of accessible interfaces, large language models (LLMs) have been applied to an increasing number of fields. However, due to the vast amounts of data and computational resources required for model development, protecting the model's parameters and training data has become an urgent and crucial concern.
Due to the revolutionary training and application paradigms of LLMs, many new attacks on language models have emerged in recent years. In this paper, we define these attacks as ``reverse engineering'' (RE) techniques on LMs and aim to provide an in-depth analysis of reverse engineering of language models. We illustrate various methods of reverse engineering applied to different aspects of a model, while also providing an introduction to existing protective strategies.
On the one hand, it demonstrates the vulnerabilities of even black box models to different types of attacks; on the other hand, it offers a more holistic perspective for the development of new protective strategies for models.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: security and privacy
Contribution Types: Surveys
Languages Studied: English
Submission Number: 5025
Loading