Many AI red-teaming frameworks target Large Language Models (LLMs) and other generative systems via direct access or through APIs. These approaches do not always reflect the complexities of real-world deployments. Production AI applications often incorporate content moderation, guardrails, user interface constraints, and other filtering mechanisms, which can alter both user inputs and system outputs. To capture the full range of vulnerabilities, we present \emph{Witty Gerbil Chrome Extension}, a browser-based testing solution coupled with a Python orchestration backend. By automating AI interactions directly in the browser, our framework preserves production safeguards and transforms, providing a more realistic picture of overall system risk. This paper outlines the architecture, operation modes, and limitations of this extension, emphasizing the importance of holistic AI evaluations that include user-facing layers in addition to core model testing.
Keywords: AI Safety, Ethical AI, AI Evaluation, AI Application
TL;DR: We introduce an open source Chrome Extension that facilities the evaluation of Artificial Intelligence systems as they are in production environments, bridging the gap between real deployed applications and model evaluations.
Abstract:
Submission Number: 13
Loading