Keywords: WAF, PPO
TL;DR: We design a Web Application Firewall (WAF) evasion environment and apply PPO to learn to generate evasions for SQL injection payloads.
Abstract: Web Application Firewalls (WAF) are widely deployed to protect web servers from security threats like SQL injections. WAF products employ various techniques, e.g., syntax signature and machine learning, to detect and block suspicious web traffics. However, no WAF can be absolutely secure, there are always space for adversaries to craft malicious messages that can evade the detection. In the past, most evasion techniques are developed manually, which requires labour and intelligence. In this work, we propose to explore the possibility of automating the process of WAF evasion using reinforcement learning. We created a reinforcement learning environment (based on OpenAI gym) for WAF evasion tasks and evaluate various mainstream WAF products with Proximal Policy Optimization (PPO) algorithm. Our framework successfully discovered numbers of evasion payloads for each WAF in our experiments and can significantly outperform baseline policy. Finally, we extract common patterns from the discovered evasion payloads and discuss weaknesses/flaws of existing WAF products as well as suggested improvements. (Our 5-minutes video presentation can be accessed at https://bit.ly/3gGwfBa with password DE@DB33F)