NegCPARBP: Enhancing Privacy Protection for Cross-Project Aging-Related Bug Prediction Based on Negative Database
Abstract: The emergence of Aging-Related Bugs (ARBs) poses a significant challenge to software systems, resulting in performance degradation and increased error rates in resource-intensive systems. Consequently, numerous ARB prediction methods have been developed to mitigate these issues. However, in scenarios where training data is limited, the effectiveness of ARB prediction is often suboptimal. To address this problem, Cross-Project Aging-Related Bug Prediction (CPARBP) is proposed, which utilizes data from other projects (i.e., source projects) to train a model aimed at predicting potential ARBs in a target project. However, the use of source-project data raises privacy concerns and discourages companies from sharing their data. Therefore, we propose a method called Cross-Project Aging-Related Bug Prediction based on Negative Database (NegCPARBP) for privacy protection. NegCPARBP first converts the feature vector of a software file into a binary string. Second, the corresponding Negative DataBase (NDB) is generated based on this binary string, containing data that is significantly more expressive from the original feature vector. Furthermore, to ensure more accurate prediction of ARB-prone and ARB-free files based on privacy-protected data (i.e., maintain the data utility), we propose a novel negative database generation algorithm that captures more information about important features, using information gain as a measure. Finally, NegCPARBP extracts a new feature vector from the NDB to represent the original feature vector, facilitating data sharing and ARB prediction objectives. Experimental results on Linux, MySQL, and NetBSD datasets demonstrate that NegCPARBP achieves a high defense against attacks (privacy protection performance reaching 0.97) and better data utility compared to existing privacy protection methods.
Loading