Keywords: Cybersecurity, Large Language Model Optimizer, CTF Tasks, Agent Learning
TL;DR: We leverage Trace-based agentic architectures—combined with both actor-only & actor-critic methods—to iteratively optimize LLM reasoning for cybersecurity CTFs, achieving 25% success on CyBench.
Track: Short Paper (up to 4 pages)
Abstract: We have seen increasing integration of Large Language Models (LLMs) into cybersecurity workflows. While existing efforts, such as CyBench (Zhang et al., 2024), have established benchmarks for evaluating LLMs in security tasks, they predominantly rely on Chain-of-Thought (CoT) reasoning with repeated querying. In this work, we introduce a novel agentic workflow that leverages Trace, a computational graph-based framework that analyzes execution traces via Directed Acyclic Graphs (DAGs) to systematically refine LLM reasoning in cybersecurity tasks. By structuring execution as a graph traversal problem, our approach enhances the model’s ability to iteratively generate, analyze, and optimize its code-based solutions, improving both reasoning depth and task success rates. We demonstrate our approach on a subset of Capture the Flag (CTF) tasks from the CyBench benchmark, covering domains such as cryptography and reverse engineering. Our proposed approach solves 10 tasks, achieving 25% solved rate, compared to 17.5% from the base model alone, and outperforming o3-mini (22.5%).
Format: We have read the camera-ready instructions, and our paper is formatted with the provided template.
Supplementary Material: zip
De-Anonymization: This submission has been de-anonymized.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 32
Loading