Abstract: Greybox fuzzing has achieved success in revealing bugs and vulnerabilities in programs. However, bit-level randomized mutation strategies have limited the fuzzer's performance on structured data. Specialized fuzzers can handle specific structured data, but require additional efforts in grammar and suffer from low throughput. In this paper, we explore the potential of utilizing Large Language Models~(LLMs) to enhance greybox fuzzing for structured data. We utilize the pre-trained knowledge of LLM about data conversion and format to generate new valid inputs. We further enhance the LLM on structured formats and mutation strategies by fine-tuning with paired mutation seeds. Our LLM-enhanced fuzzer, LLAMAFUZZ, integrates the power of LLM to understand and mutate structured data to fuzzing.
On standard fuzzing benchmarks, LLAMAFUZZ outperformed the top competitor by 41 bugs on average.
LLAMAFUZZ also achieved competitive results against specialized grammar-based fuzzers. On real-world programs, it attained significantly higher branch coverage in 11 of 15 targets compared to the baseline AFL++. Lastly, we present case studies to explain how LLMs enhance the fuzzing process in terms of code coverage.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: security/privacy;
Contribution Types: NLP engineering experiment
Languages Studied: Hex representation, Programming language, natural language
Submission Number: 2610
Loading