Norm Compliance in Reinforcement Learning Agents via Restraining Bolts

Emery A. Neufeld, Agata Ciabattoni, Radu Florin Tulcan

Published: 15 Dec 2024, Last Modified: 15 Jan 2026OpenReview Archive Direct UploadEveryoneRevisionsCC BY-NC 4.0

Abstract: We modify the restraining bolt technique, originally designed for safe reinforcement learning, to regulate agent behavior in alignment with social, ethical, and legal norms. Rather than maximizing rewards for norm compliance, our approach minimizes penalties for norm violations. We demonstrate in case studies the effectiveness of our approach in capturing benchmark challenges in normative reasoning like contrary-to-duty obligations, exceptions, and temporal obligations.