CRAKEN: Cybersecurity LLM Agent with Knowledge-Based Execution

CRAKEN: Cybersecurity LLM Agent with Knowledge-Based Execution

ICLR 2026 Conference Submission13539 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM Agentic System, Capture the Flag, Retrieval Augmented Generation, Cybersecurity

TL;DR: This paper proposes a knowledge-based framework to optimize agentic system solving CTF challenges that require specific knowledge domain with retrieval based navigation.

Abstract: Large Language Model (LLM) agents can automate cybersecurity tasks and can adapt to the evolving cybersecurity landscape without re-engineering. While LLM agents have demonstrated cybersecurity capabilities on Capture-The-Flag (CTF) competitions, they have two key limitations: accessing latest cybersecurity expertise beyond training data, and integrating new knowledge into complex task planning. Knowledge-based approaches that incorporate technical understanding into the task-solving automation can tackle these limitations. We present CRAKEN, a knowledge-based LLM agent framework that improves cybersecurity capability through three core mechanisms: contextual decomposition of task-critical information, iterative self-reflected knowledge retrieval, and solution injection that transforms insights into adaptive attack strategies. CRAKEN combines advanced retrieval algorithms with prompt injection-based integration. Comprehensive evaluations with different configurations show CRAKEN's effectiveness in multi-stage vulnerability detection and exploitation compared to previous approaches. Our extensible architecture establishes new methodologies for embedding new security knowledge into LLM-driven cybersecurity agentic systems. CRAKEN obtained an accuracy of 22% on NYU CTF Bench with our collected simple CTF write-up dataset and shows a 15.7% of the solution distribution difference, indicating the effectiveness of integrating knowledge into automated CTF solving for challenges that requires knowledge from specific domains.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 13539

Loading