Keywords: cooperative AI, multi-agent reinforcement learning, empowerment, disempowerment, assistance
Abstract: Empowerment, a measure of an agent's ability to control its environment, has been proposed as a universal goal-agnostic objective for motivating assistive behavior in AI agents. While multi-human settings like homes and hospitals are promising for AI assistance, prior work on empowerment-based assistance assumes that the agent assists one human in isolation. We show that assistive agents optimizing for one human's empowerment can significantly reduce another human's environmental influence and rewards—a phenomenon we formalize as "disempowerment." We characterize when disempowerment occurs in multi-agent environments and show that naive approaches do not fully solve this problem. Our work reveals a broader challenge for the AI alignment community: goal-agnostic objectives that seem aligned in single-agent settings can become misaligned in multi-agent contexts.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 23747
Loading