Abstract: In this work, we present a novel dataset specifically designed for predicting pull request (PR) outcomes using large language models (LLMs). Our dataset is the first to integrate textual and code-related features, allowing the use of LLMs in PR outcome prediction, in contrast to earlier techniques that rely on numerical datasets. To construct this dataset we collected and carefully filtered pull request data from six well-known repositories on GitHub, the largest platform for collaborative code development. The dataset consists of 300 pull requests (PRs), each labeled with `green' and `red' flags to predict whether the PR will be merged or rejected. The PRs are annotated based on key features such as PR title, body, comments, contributor statistics, code changes, and related issues. The merged-to-unmerged PR ratio in the dataset is approximately 2:1. To promote reproducibility and foster further research, we will publicly release the dataset. This work lays the groundwork for building intelligent systems that can assist in PR review and decision-making by leveraging the capabilities of LLMs.
Paper Type: Short
Research Area: Resources and Evaluation
Research Area Keywords: NLP datasets, benchmarking, corpus creation, evaluation methodologies, LLM Efficiency, prompting, reproducibility, software and tools
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 1304
Loading