VulFinder: A Multi-Agent-Driven Test Generation Framework for Guiding Vulnerability Reachability Analysis
Keywords: software supply chain, reachability of vulnerabilities, multi-agent, reduce false alarm and missed alarm
Abstract: Reusing third-party components in the software supply chain (SSC) may introduce risks of vulnerabilities. After disclosing a new third-party component vulnerability, developers need to determine whether the project is affected by the specific vulnerability, which requires vast manpower and resources for assessment. Current approaches mainly rely on dependency-based tools and genetic algorithm-based methods to assess the reachability problem of vulnerabilities in SSC. However, these methods suffer from several issues: they ignore the actual invocation of the vulnerable code, resulting in high false positive rates, are limited to certain vulnerabilities, leading to high false negative rates, and are confined to the Java ecosystem.
To overcome these challenges, we propose VulFinder, a multi-agent driven framework for validating vulnerability reachability. VulFinder begins by using static code analysis tools to construct function call paths between downstream applications and dependency vulnerability APIs. Leveraging a multi-agent mechanism comprising a distillator, discriminator, generator, and validator, VulFinder iteratively generates exploit tests for methods along the call graph, effectively validating vulnerability reachability by executing these tests on downstream applications. By integrating the code comprehension capabilities of large language models (LLMs) with the multi-agent framework, VulFinder addresses the coverage limitations of existing tools, reduces false alarms and missed alarms, and demonstrates robust generalizability across multiple programming languages.
Experiments show that VulFinder achieves 21\% accuracy improvement over the state-of-the-art tool on the Java dataset and also demonstrates robust generalizability on the Python dataset, significantly reducing false positives and false negatives and delivering an average efficiency improvement of more than 1.5×.
Primary Area: infrastructure, software libraries, hardware, systems, etc.
Submission Number: 15699
Loading