When Parametric Knowledge Wins: A Controlled Ablation of Agent Skills and Tool Use for PII Detection in Small Language Models

TMLR Paper7688 Authors

26 Feb 2026 (modified: 14 Mar 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Agent augmentation is widely assumed to improve performance, yet this study presents that for small language models, it systematically degrades capability under controlled conditions. This paper identifies a structural failure mode in agentic pipelines: agent augmentation that is assumed to benefit capable models systematically degrades performance in the 7–9B parameter class. A controlled ablation was run across four open-weight instruction-tuned models (Gemma 2 9B, Llama 3.1 8B, Mistral 7B, Qwen 2.5 7B) and four conditions: zero-shot prompting, documentation injection (+Docs), tool access (+Tool), and skills injection (+Skills). The benchmark is a stratified 2,000-sample dataset drawn from three public PII sources and scored against PII-Codex canonical types after full label alignment. A systematic capability regression is presented caused by agent augmentation in 7–9B parameter models. Under strict canonical-to-canonical scoring, zero-shot prompting outperforms every augmented condition for every included model in the 7–9B class. Tool use and skills injection reduce mean F1 by 13 to 24 percentage points relative to zero-shot (p < 0.0001, Cohen’s d from −0.39 to −0.67). Documentation is mostly neutral, though it significantly hurts Llama 3.1 8B (∆ = −0.17). Adding a Skill document on top of tool access provided no measurable benefit for any model. The degradation is not uniform. Structured types like Date and IP Address actually improve under tool use, while temporal (Date Time) and medical (Health Insurance ID) types collapse near zero, driven by label-schema mismatches between PII-Codex output and ground truth. Implications are discussed for evaluation methodology and agentic pipeline design in the 7–9B parameter class.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Xinrun_Wang1
Submission Number: 7688
Loading