Automatic Commit Range Identification of Untagged Version

Published: 01 Jan 2024, Last Modified: 20 Aug 2025APSEC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Aligning software product versions to commits is extremely important for fixing vulnerabilities in released versions. Existing work is proposed based on tags in the code repository. However, in practice, many software versions widely used in IT companies are reported with many high-risk vulnerabilities. In contrast, they have no indicator information (i.e., tags) in their source code repository. Such a situation results in the difficulty of tracing special versions to their particular commits for effectively fixing vulnerabilities. In this paper, we first study the software released on the Maven repository and hosted on GitHub. We collect and analyze the statistics of those versions that are reported with high-risk vulnerabilities but have no explicit information to locate the commit where they are released. To effectively locate the commits where a special version is released, we propose a novel approach named ContAlign and make a comprehensive comparison with three baselines that are proposed based on the two most common strategies: time-based ones and range-based ones. The experimental results on our built dataset indicate that ContAlign can obtain a good performance of 0.89 in terms of accuracy when identifying the commit range which covers the truth release commit of a specific version and improves baselines by 50.3%-102.20/0. Besides, we also conduct a human study with 10 participants to evaluate the performance and usefulness of ContAlign, the user feedback indicates that ContAlign can effectively help participants align vulnerability versions to commits to the code repository.
Loading