A Three-Branch Checks-and-Balances Framework for Context-Aware Ethical Alignment of Large Language Models
Keywords: AI Safety, Adversarial LLMs, LLM Collaborative Intelligence
Abstract: This paper introduces a three-branch checks-and-balances framework for ethical alignment of Large Language Models (LLMs), inspired by the idea of collaborative intelligence. It implements three independent yet interacting components: LLMs as the executive branch for knowledge generation, DIKE (the goddess of justice) as the legislative branch establishing ethical guardrails, and ERIS (the goddess of discord) as the judicial branch for contextual interpretation. The adversarial DIKE-ERIS duality enables adaptation to diverse cultural contexts while upholding consistent ethical principles. This architecture addresses limitations of reinforcement learning with human feedback (RLHF) by providing interpretable, adaptable, and culturally-aware ethical reasoning. Through self-supervised learning and adversarial testing, our framework demonstrates how emotional modeling can guide linguistic behaviors toward ethical outcomes while preserving independence across knowledge generation, ethical oversight, and contextual interpretation.
Submission Number: 50
Loading