An entropy-based measure of fork diversity and its correlations with open source software projects' received contributions
Abstract: The fork-and-pull-based method is an important way for open-source software (OSS) projects to receive contributions. In this study, we introduce a novel metric called fork entropy, inspired by biodiversity, to measure the diversity of OSS projects’ forks beyond their simple counts. Based on Rao’s quadratic entropy, the metric measures the diversity of forks in changing project files. We validate the proposed metric through empirical studies on 102 OSS projects from the Github and Gitlab platforms. The results show significant correlations between the fork entropy of a project and its contributions received with respect to external productivity, acceptance rate of external pull requests, and number of reported bugs. Our findings also reveal significant interactions between fork entropy and other factors, such as the number of forks. Furthermore, the time-shift correlation suggests that the historical impact of the fork entropy, along with other control variables, remains effective for up to twenty months. Based on these insights, we propose to predict a project’s received contributions using fork entropy and other control variables with both a classic linear ARMAX model (Autoregressive Moving Average with Exogenous Variables) and a deep, Transformer-based prediction model. Compared to making predictions using only current data, the models show improved performance in terms of higher prediction accuracy and faster convergence by including historical data. In summary, this work presents a comprehensive study on the correlations and temporal dependencies between the diversity of an OSS project’s forks, measured by the proposed fork entropy, and its received contributions. These findings provide insights for project maintainers and contributors to comprehend and coordinate their forking practices.
Loading