Black-Box Red Teaming of Agentic AI: A Taxonomy-Driven Framework for Automated Risk Discovery

Divyanshu Kumar; Nitin Aravind Birur; Tanay Baswa; Sahil Agarwal; Prashanth Harshangi

Black-Box Red Teaming of Agentic AI: A Taxonomy-Driven Framework for Automated Risk Discovery

Divyanshu Kumar, Nitin Aravind Birur, Tanay Baswa, Sahil Agarwal, Prashanth Harshangi

Published: 10 Jan 2026, Last Modified: 10 Jan 2026LaMAS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Agent Security, Red Teaming

TL;DR: Black-box agent testing via 7-domain taxonomy + automated red teaming finds 98% of vulnerabilities with 10x less effort. Key risk: 56% governance, 65% privacy failures

Abstract: Agentic systems are rapidly moving to production, where they read untrusted inputs, call tools with real permissions, and act autonomously, expanding the security surface beyond chat-only models. Yet, standard evaluations remain single-turn and fail to capture multi-step agent vulnerabilities. We present a systematic black-box framework for risk-aware agent evaluation requiring only basic system descriptions. Our approach introduces: (1) a seven-domain taxonomy mapping observable behaviors to risk categories, (2) fully automated SAGE-RT red teaming producing 120 adversarial scenarios per domain, and (3) human-validated evaluation using LLM judges. Empirical validation across two agent frameworks (CrewAI and AutoGen) with four base models reveals alarming patterns: 56.25\% average governance risk, 65\% privacy risk in multi-agent configurations, and agent behavior vulnerabilities reaching 85\%. Our black-box approach effectively identifies critical architectural vulnerabilities without privileged access, providing a scalable path toward safer agent deployments.

Submission Number: 1

Loading