FuzzAug: Data Augmentation by Coverage-guided Fuzzing for Neural Test Generation

ACL ARR 2025 May Submission2268 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Testing is essential to modern software engineering for building reliable software. Given the high costs of manually creating test cases, automated test case generation, particularly methods utilizing large language models, has become increasingly popular. These neural approaches generate semantically meaningful tests that are more maintainable compared with traditional automatic testing methods like fuzzing. However, the diversity and volume of unit tests in current datasets are limited, especially for newer but important languages. In this paper, we present a novel data augmentation technique, FuzzAug, that introduces the benefits of fuzzing to large language models by introducing valid testing semantics and providing diverse coverage-guided inputs. Doubling the size of training datasets, FuzzAug improves the performances from the baselines significantly. This technique demonstrates the potential of introducing prior knowledge from dynamic software analysis to improve neural test generation, offering significant enhancements in neural test generation.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: Data Augmentation, Unit Test Generation, Programming Languages
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Data resources
Languages Studied: Programming Language, Rust
Keywords: Data Augmentation, Unit Test Generation, Programming Languages
Submission Number: 2268
Loading