Instruction-Level Weight Shaping: A Framework for Self- Improving AI Agents

TMLR Paper6079 Authors

03 Oct 2025 (modified: 10 Oct 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models (LLMs) excel at surface fluency yet remain structurally static after pre-training; new or evolving domain knowledge is typically bolted on via retrieval-augmented generation (RAG) or parameter fine-tuning. In practice, RAG often retrieves facts without integrating them logically, adds latency and engineering overhead. Free-form prompt injection and ad hoc prompt engineering are brittle, prone to context-window drift, and can conflict with pre-trained knowledge. Fine-tuning, while effective for specific domains, is resource-intensive and risks catastrophic forgetting. We propose Instruction-Level Weight Shaping (ILWS), which treats curated system instructions as external, auditable pseudo-parameters updated post-session via reflection and user feedback. After each session an LLM-driven Reflection Engine inspects the conversation trace, diagnoses reasoning successes or failures, and proposes typed deltas $\Delta K=(\Delta S,\Delta U,\Delta T)$ over instructions, user preferences, and tools. Each delta is version-controlled, evaluated under a sliding-window analysis of 1-5 star ratings, automatically repaired on first failure, and rolled back on repeated failure. When the accumulated edit budget crosses a threshold, the agent compiles a rating-weighted synthetic dataset and distils matured instruction-space gains into parameters, converting prompt-space improvements into weight-space without downtime. Empirically, ILWS makes explicit the low-rank shaping implicitly induced by context in transformer blocks and preserves governance while eliminating per-call retrieval. In enterprise support, ILWS raised throughput by 2.4--5.0$\times$ and cut audited hallucinations by $\sim$80% versus a frozen baseline. A real-world e-commerce platform PoC called "L0 Support" with 1M-token context achieved 4--5$\times$ gains in tickets/hour and an $\sim$80% reduction in time per ticket, with autonomous instruction updates and optional tool synthesis. Because ILWS operates at the instruction layer until a controlled distillation stage, it generalises to dynamic domains (legal, medical, engineering) requiring adaptive reasoning, tool creation, and low-latency deployment.
Submission Type: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=3sJuTqY3zZ
Changes Since Last Submission: The previous submission was desk-rejected due to incorrect template usage. This resubmission addresses this issue by: 1. Using the official TMLR template: Now properly using `\usepackage{tmlr}` with the official tmlr.sty style file 2. Correct bibliography style: Changed from plainnat to tmlr.bst as required 3. Proper author format: Using TMLR's `\name` and `\email` commands for author information 4. Removed conflicting packages: Eliminated packages that conflicted with the TMLR template (geometry, times, fontenc) All content, methodology, experiments, and results remain unchanged. Only the LaTeX template compliance has been corrected.
Assigned Action Editor: ~Tim_Genewein1
Submission Number: 6079
Loading