Dual-Component Deep Domain Adaptation: A New Approach for Cross Project Software Vulnerability Detection

Van Nguyen; Trung Le; Olivier de Vel; Paul Montague; John C Grundy; Dinh Phung

Dual-Component Deep Domain Adaptation: A New Approach for Cross Project Software Vulnerability Detection

Van Nguyen, Trung Le, Olivier de Vel, Paul Montague, John C Grundy, Dinh Phung

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Domain adaptation, Cyber security, Software vulnerability detection, Machine learning, Deep learning

TL;DR: Our aim in this paper is to propose a new approach for tackling the problem of transfer learning from labeled to unlabeled software projects in the context of SVD in order to resolve the mode collapsing problem faced in previous approaches.

Abstract: Owing to the ubiquity of computer software, software vulnerability detection (SVD) has become an important problem in the software industry and in the field of computer security. One of the most crucial issues in SVD is coping with the scarcity of labeled vulnerabilities in projects that require the laborious manual labeling of code by software security experts. One possible way to address is to employ deep domain adaptation which has recently witnessed enormous success in transferring learning from structural labeled to unlabeled data sources. The general idea is to map both source and target data into a joint feature space and close the discrepancy gap of those data in this joint feature space. Generative adversarial network (GAN) is a technique that attempts to bridge the discrepancy gap and also emerges as a building block to develop deep domain adaptation approaches with state-of-the-art performance. However, deep domain adaptation approaches using the GAN principle to close the discrepancy gap are subject to the mode collapsing problem that negatively impacts the predictive performance. Our aim in this paper is to propose Dual Generator-Discriminator Deep Code Domain Adaptation Network (Dual-GD-DDAN) for tackling the problem of transfer learning from labeled to unlabeled software projects in the context of SVD in order to resolve the mode collapsing problem faced in previous approaches. The experimental results on real-world software projects show that our proposed method outperforms state-of-the-art baselines by a wide margin.

Original Pdf: pdf

4 Replies

Loading