Pre-Trained Model-Based Automated Software Vulnerability Repair: How Far are We?

Quanjun Zhang; Chunrong Fang; Bowen Yu; Weisong Sun; Tongke Zhang; Zhenyu Chen

Pre-Trained Model-Based Automated Software Vulnerability Repair: How Far are We?

Quanjun Zhang, Chunrong Fang, Bowen Yu, Weisong Sun, Tongke Zhang, Zhenyu Chen

Published: 01 Jan 2024, Last Modified: 13 Nov 2024IEEE Trans. Dependable Secur. Comput. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Various approaches are proposed to help under-resourced security researchers to detect and analyze software vulnerabilities. It is still incredibly time-consuming and labor-intensive for security researchers to fix such reported vulnerabilities due to the increasing size and complexity of modern software systems. The time lag between the reporting and fixing of a security vulnerability causes software systems to suffer from significant exposure to possible attacks. Very recently, some techniques propose to apply pre-trained models to fix security vulnerabilities and have proved their success in improving repair accuracy. However, the effectiveness of existing pre-trained models has not been systematically compared and little is known about their advantages and disadvantages. To bridge this gap, we perform the first extensive study on applying various pre-trained models to automated vulnerability repair. The experimental results on two vulnerability datasets show that all studied pre-trained models consistently outperform the state-of-the-art technique VRepair with a prediction accuracy of 32.94% $\sim$ 44.96%. We also investigate the impact of three major phases (i.e., data pre-processing, model training and repair inference) in the vulnerability repair workflow. Inspired by the findings, we construct a simplistic vulnerability repair approach that adopts the transfer learning from bug fixing. Surprisingly, such a simplistic approach can further improve the prediction accuracy of pre-trained models by 9.40% on average. Besides, we provide additional discussion from different aspects (e.g., code representation and a preliminary study with ChatGPT) to illustrate the capacity and limitation of pre-trained model-based techniques. Finally, we further pinpoint various practical guidelines (e.g., the improvement of fine-tuning) for advanced pre-trained model-based vulnerability repair in the near future. Our study highlights the promising future of adopting pre-trained models to patch real-world security vulnerabilities and reduce the manual debugging effort of security experts in practice.

Loading