Abstract: Automated program repair (APR) tools generally use a test suite to localize bugs and validate patches. These patches may pass the test suite but still be incorrect, which is called overfitting. To better understand the relationship between code coverage and overfitting, we aim to quantify the reduction in overfitting achieved by having more tests to cover branches multiple times, once 100% branch coverage is reached. We also investigate whether having more such tests increases the chances that the generated patch is exact, meaning that the patched program is syntactically the same as the original, high quality correct code in our dataset. Our experiments used three different test suites, each covering all branches in the code: test suites that cover each branch at least 1, 3, or 5 times. We used seven well-known APR tools for Java on a dataset of buggy programs equipped with formal specifications. Using formal methods allows us to reliably and objectively check for overfitting. Our experimental results indicate that enlarging the test suite beyond 100% coverage of branches reduces overfitting. However, beyond a certain threshold, expanding the test suite to cover branches repeatedly does not reduce overfitting.
Loading