A Checks-and-Balances Framework for Context-Aware Ethical AI Alignment

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: An adaptive framework for context-aware ethical alignment based on the interpretation of constitutions.
Abstract: This paper introduces a checks-and-balances framework for ethical alignment of Large Language Models (LLMs), inspired by three-branch governmental systems. It implements three independent yet interacting components: LLMs as the executive branch for knowledge generation, DIKE as the legislative branch establishing ethical guardrails, and ERIS as the judicial branch for contextual interpretation. Beyond structural separation, we address a fundamental challenge: regulating emotion to shape behaviors. Drawing from psychological theories where managing emotional responses prevents harmful behaviors, we develop a self-supervised learning pipeline that maps emotions to linguistic behaviors, enabling precise behavioral modulation through emotional conditioning. By integrating this approach with adversarial testing, our framework demonstrates how DIKE and ERIS direct linguistic behaviors toward ethical outcomes while preserving independence throughout knowledge generation, ethical oversight, and contextual interpretation.
Lay Summary: We built a new way to keep AI chatbots like ChatGPT helpful and well-behaved, inspired by how teams work together. Imagine three friends: one writes answers (the AI), another gives wise advice (DIKE), and the third asks tough questions (ERIS). They check each other to make sure everything stays fair and kind. Here's why this works: Just like people learn to pause when angry instead of saying something hurtful, we teach AI to spot emotional language (like frustration or bias) and respond more thoughtfully. Instead of constantly fixing mistakes after they happen, our system helps the AI develop good habits from the start—like raising a polite child rather than just scolding them for misbehaving. The result? Smarter, kinder AI that understands different cultures and situations naturally.
Primary Area: Social Aspects->Safety
Keywords: AI safety, ethical alignment, checks-and-balances, behavior modeling
Flagged For Ethics Review: true
Submission Number: 7847
Loading