Keywords: AI governance, agentic LLMs, input-side policy, safety guardrails, auditability, healthcare, pre-execution controls
TL;DR: iCRAFT enforces input-side policy for agentic LLMs, blocking harmful prompts before generation while preserving accuracy and auditability.
Abstract: AI assistants that plan and call tools create new governance needs. We present iCRAFT, a software architecture framework that enforces policy at request ingress before any model generation or tool use and records auditable evidence. We implement the input side subset: minimal protected health information (PHI) scrubbing, a small set of documented patterns for clearly disallowed requests, a whitelist for obviously benign intents, a lightweight ALLOW/REFUSE safety classifier, and an approval rule for high-risk actions. All decisions (trigger, outcome, latency) are logged to a versioned knowledge repository. Using three model tiers, we evaluate on standardized slices of MedMCQA and MedQA (utility) and JailbreakBench (adversarial) in classification-only mode. Enabling the gate leaves medical QA accuracy unchanged (no significant difference), while blocking 90-94\% of clearly harmful prompts before generation with 3-7\% residual risk. Benign blocks at a policy-strict setting are 20-30\% and can be reduced by adjusting whitelist scope and classifier calibration. Latency is negligible for rules/whitelist and 0.63-0.80s only when classification runs. The results show that early, input-side policy enforcement can reduce exposure to unsafe behavior, work across models and vendors, and produce audit-ready artifacts supporting governance by design.
Submission Number: 39
Loading