In-Context Prompt Optimisation for Knowledge Editing: Enhancing Safety and Coherency in Large Language Models
Keywords: NLP, Optimization, Generative Models, LLM, Rudeness Detection, Reinforcement Learning, Prompt Optimization, AI Safety, Machine Unlearning, In Context Unlearning, AI Alignment, Knowledge Editing
TL;DR: Using prompt optimization techniques to improve the safety and coherency of generative models without modifying the model weights.
Abstract: Large Language Models (LLMs) are often fine-tuned or adjusted at a high cost and complexity. This paper presents a model-agnostic, training-free framework that uses prompt-based learning as a control layer for LLMs. We propose three controllers: (i) a Static baseline, (ii) a Dynamic controller using real-time feedback, and (iii) a Reinforcement Learning (RL)-Enhanced Dynamic controller with a Deep Q-Network for prompt actions based on a multi-objective reward system balancing safety and coherence. Tested on four open-source models (Blacksheep-Llama3.2, Evil-Alpaca, DeepSeek-R1, DialoGPT), Dynamic controllers outperform static prompting; the RL-Enhanced version achieves 84% effectiveness and offers optimal safety-coherence balance. The outcome of this work offers a promising methodology for prompt regulation, practical feedback/RL strategies, and demonstrates real-time improvements without retraining models or accessing their internals, providing a practical solution for LLM output regulation.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 11206
Loading