Abstract: GitHub is one of the most popular hosting platforms for open-source projects, where tags are widely used to facilitate software organization and retrieval. However, the existences of inadequate and low-quality tags on GitHub hinder users from searching and retrieving their desired projects. In this paper, we propose MF-TagRec, an automatic tag recommendation method for projects by extracting multiple features from Readme documents, programming languages and dependency package tags of projects. We capture topics and global semantics of Readme documents as text features, along with programming languages and dependency package tags as word vector features. We construct a convolutional neural network and feed text and word vector features to predict the most relevant tags for untagged or few tag-assigned projects. We evaluate our proposed MF-TagRec on a real dataset GitHubDepDataSet compared with five baselines. The results show that MF-TagRec achieves Recall@5 and Recall@10 to 0.756 and 0.864 respectively, which outperforms the previous baselines.
0 Replies
Loading