HGNNLink: recovering requirements-code traceability links with text and dependency-aware heterogeneous graph neural networks

Published: 01 Jan 2025, Last Modified: 22 Jul 2025Autom. Softw. Eng. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Manually recovering traceability links between requirements and code artifacts often consumes substantial human resources. To address this, researchers have proposed automated methods based on textual similarity between requirements and code artifacts, such as information retrieval (IR) and pre-trained models, to determine whether traceability links exist between requirements and code artifacts. However, in the same system, developers often follow similar naming conventions and repeatedly use the same frameworks and template code, resulting in high textual similarity between code artifacts that are functionally unrelated. This makes it difficult to accurately identify the corresponding code artifacts for requirements artifacts solely based on textual similarity. Therefore, it is necessary to leverage the dependency relationships between code artifacts to assist in the requirements-code traceability link recovery process. Existing methods often treat dependency relationships as a post-processing step to refine textual similarity, overlooking the importance of textual similarity and dependency relationships in generating requirements-code traceability links. To address these limitations, we proposed Heterogeneous Graph Neural Network Link (HGNNLink), a requirements traceability approach that uses vectors generated by pre-trained models as node features and considers IR similarity and dependency relationships as edge features. By employing a heterogeneous graph neural network, HGNNLink aggregates and dynamically evaluates the impact of textual similarity and code dependencies on link generation. The experimental results show that HGNNLink improves the average F1 score by 13.36% compared to the current state-of-the-art (SOTA) method GA-XWCoDe in a dataset collected from ten open source software (OSS) projects. HGNNLink can extend IR methods by using high similarity candidate links as edges, and the extended HGNNLink achieves a 2.48% improvement in F1 compared to the original IR method after threshold parameter configuration using a genetic algorithm.
Loading