Identifying Affected Libraries and Their Ecosystems for Open Source Software Vulnerabilities

Susheng Wu, Wenyan Song, Kaifeng Huang, Bihuan Chen, Xin Peng

Published: 01 Jan 2024, Last Modified: 13 Nov 2024ICSE 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Software composition analysis (SCA) tools have been widely adopted to identify vulnerable libraries used in software applications. Such SCA tools depend on a vulnerability database to know affected libraries of each vulnerability. However, it is labor-intensive and error prone for a security team to manually maintain the vulnerability database. While several approaches adopt extreme multi-label learning to predict affected libraries for vulnerabilities, they are practically ineffective due to the limited library labels and the unawareness of ecosystems.To address these problems, we first conduct an empirical study to assess the quality of two fields, i.e., affected libraries and their ecosystems, for four vulnerability databases. Our study reveals notable inconsistency and inaccuracy in these two fields. Then, we propose Holmes to identify affected libraries and their ecosystems for vulnerabilities via a learning-to-rank technique. The key idea of Holmes is to gather various evidences about affected libraries and their ecosystems from multiple sources, and learn to rank a pool of libraries based on their relevance to evidences. Our extensive experiments have shown the effectiveness, efficiency and usefulness of Holmes.