Instruction-Level Weight Shaping: A Framework for Self- Improving AI Agents

Instruction-Level Weight Shaping: A Framework for Self- Improving AI Agents

TMLR Paper6079 Authors

03 Oct 2025 (modified: 25 Jan 2026)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models (LLMs) excel at surface fluency yet remain structurally static af- ter pre-training; new or evolving domain knowledge is typically bolted on via retrieval- augmented generation (RAG) or parameter fine-tuning, but RAG often retrieves facts with- out integrating them logically and adds latency, while fine-tuning is resource-intensive and risks catastrophic forgetting. We propose Instruction-Level Weight Shaping (ILWS), which treats curated system instructions as external, auditable pseudo-parameters updated post- session via reflection and user feedback: after each session an LLM-driven Reflection Engine inspects the conversation trace, diagnoses reasoning successes or failures, and proposes typed deltas ∆K = (∆S, ∆U, ∆T ) over instructions, user preferences, and tools; each delta is version-controlled, evaluated under a sliding-window analysis of 1–5 star ratings, automati- cally repaired on first failure, and rolled back on repeated failure; and when the accumulated edit budget crosses a threshold, the agent can optionally compile a rating-weighted synthetic dataset and distil matured instruction-space gains into parameters. Empirically, ILWS makes explicit the low-rank shaping implicitly induced by context in transformer blocks and preserves governance while eliminating per-call retrieval: in a real-world e-commerce platform proof of concept (PoC) called “L0 Support” with 1M-token context, a single opera- tor using the reflection-driven knowledge accumulation achieved 4–5× gains in tickets/hour and ∼80% reduction in time per ticket, with first-shot resolution improving from ∼20% to ∼90%; when the matured instruction base was deployed to six additional operators without further reflection updates, they reported comparable gains, suggesting that ILWS produces transferable domain specialisation akin to fine-tuning but without parameter modification. Because ILWS operates at the instruction layer, it generalises to dynamic domains (legal, medical, engineering) requiring adaptive reasoning, tool creation, and low-latency deploy- ment.

Submission Length: Long submission (more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=3sJuTqY3zZ

Changes Since Last Submission: Formatting: - Abstract merged into a single paragraph as required by TMLR guidelines - Broader Impact Statement moved to after Conclusion - Abbreviated terms spelled out on first use: "proof of concept (PoC)", "95th percentile (p95)", "multi-layer perceptron (MLP)" Addressing Reviewer Concerns: Single-operator bias: - Added "Multi-Operator Deployment (Observational)" paragraph: after the instruction base matured, it was deployed in frozen form to six additional operators who reported comparable gains, with qualitative feedback included - Added convergence observation: reflection proposals decreased over time, reaching 10+ sessions with no new deltas RAG baseline: - Expanded RAG description with full configuration (text-embedding-3-small, 400-token chunks, 100-token overlap) - Documented two specific failure modes: chunk incompleteness and lack of authoritative integration - Clarified that RAG is now used for optional, non-authoritative context only Distillation clarification: - Explicitly stated that distillation (Phase 4) was never executed because performance remained excellent - All reported gains are pre-distillation, instruction-space only Statistical gate: - Added acknowledgment that the gate is an engineering safeguard, not a formal hypothesis-testing framework; does not account for temporal autocorrelation or multiple testing Reproducibility: - Added model details: Gemini-2.5-pro, temperature 0.7, chosen for 1M-token context window - Added cross-model validation: instruction base tested with Claude Sonnet 4, Sonnet 4.5, and Opus 4.5 - Clarified working hours: 3-4 hours/day part-time vs 7.5-hour team standard - Added first-shot success definition and recent validation data (74 tickets, 6 follow-ups) Clarity improvements: - Added platform scale: "hosting over 10,000 merchants" - Clarified "single shot once the instruction base matured" - Added illustrative example (Paris/Brasília) demonstrating system-instruction authority vs retrieved context

Assigned Action Editor: ~Tim_Genewein1

Submission Number: 6079

Loading