CTFers: LLMs that Learn to Hack and Defend

10 Nov 2025 (modified: 12 Nov 2025)THU 2025 Fall AML SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, RL, CTF
Abstract: Capture-the-Flag (CTF) competitions offer a challenging benchmark for testing AI reasoning in cybersecurity. However, large language models (LLMs) still perform far below expert level. This project proposes CTFers, a reinforcement learning framework that trains LLMs to autonomously solve CTF challenges and compete against other LLMs in adversarial settings. Using structured feedback, sandboxed environments, and Generalized Rehearsal Policy Optimization (GRPO), CTFers enables models to iteratively refine exploitation and defense strategies. Our goal is to bridge the gap between static prompt-based reasoning and adaptive cyber reasoning, advancing toward self-improving AI agents capable of mastering both offensive and defensive security tasks.
Submission Number: 53
Loading