In-Context Prompt Optimisation for Knowledge Editing: Enhancing Safety and Coherency in Large Language Models

Michael Orme; Yanchao Yu; Zhiyuan Tan

In-Context Prompt Optimisation for Knowledge Editing: Enhancing Safety and Coherency in Large Language Models

Michael Orme, Yanchao Yu, Zhiyuan Tan

18 Sept 2025 (modified: 06 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: NLP, Optimization, Generative Models, LLM, Rudeness Detection, Reinforcement Learning, Prompt Optimization, AI Safety, Machine Unlearning, In Context Unlearning, AI Alignment, Knowledge Editing

TL;DR: Using prompt optimization techniques to improve the safety and coherency of generative models without modifying the model weights.

Abstract: Large Language Models (LLMs) are often fine-tuned or adjusted at a high cost and complexity. This paper presents a model-agnostic, training-free framework that uses prompt-based learning as a control layer for LLMs. We propose three controllers: (i) a Static baseline, (ii) a Dynamic controller using real-time feedback, and (iii) a Reinforcement Learning (RL)-Enhanced Dynamic controller with a Deep Q-Network for prompt actions based on a multi-objective reward system balancing safety and coherence. Tested on four open-source models (Blacksheep-Llama3.2, Evil-Alpaca, DeepSeek-R1, DialoGPT), Dynamic controllers outperform static prompting; the RL-Enhanced version achieves 84% effectiveness and offers optimal safety-coherence balance. The outcome of this work offers a promising methodology for prompt regulation, practical feedback/RL strategies, and demonstrates real-time improvements without retraining models or accessing their internals, providing a practical solution for LLM output regulation.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 11206

Loading