Let It Hack: Autonomous Multi-Agent Penetration Testing with LLMs and Tool-Augmented Reasoning

Published: 03 Jun 2026, Last Modified: 03 Jun 2026ALA 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLMs, Agents, Penetration Testing, Cybersecurity, Agentic AI, Lang- Graph, Large Language Models, GAAI
TL;DR: A review on the effectiveness of local LLMs for autonomous Cyber-security penetration testing.
Abstract: Recent advances in large language models (LLMs) have opened new opportunities for automating complex tasks in Cybersecurity, including offensive operations. However, most existing approaches to LLM-assisted penetration testing rely on human input or scripted interactions. This work explores a fully autonomous, local-agent framework for end-to-end penetration testing, using LangGraph to structure multi-stage tool-augmented reasoning. Without human intervention post-launch, selected open-source LLMs were tasked with scanning, vulnerability analysis, and exploitation against a standard testbed. Results show that models such as Qwen-14B and Qwen-32B can successfully execute multiple real-world exploits, demonstrating that local, API-free LLM agents can move beyond advisory roles into operational offensive security.
Journal Edition Interest: Yes
Supplementary Material: zip
Submission Number: 46
Loading