Value-Aligned Imitation via focused Satisficing

Published: 10 Oct 2024, Last Modified: 15 Nov 2024Pluralistic-Alignment 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Imitation Learning, Subdominance Minimization, Value Alignment
TL;DR: Value-aligned imitation learning based on demonstrations chosen to be acceptable instead of (near) optimal.
Abstract: According to *satisficing theory*, humans often choose *acceptable* behavior based on their personal *aspirations*, rather than achieving (near-) optimality. For example, a lunar lander demonstration that successfully lands without crashing might be acceptable to a novice despite being slow or jerky. When human aspirations are much lower than autonomous system capabilities, this can allow learned policies that sufficiently satisfy differing human objectives. Maximizing the likelihood of demonstrator satisfaction also provides guidance for learning under competing objectives that are difficult for existing imitation learning methods to resolve. Using a margin-based objective to guide deep reinforcement learning, our **focused satisficing** approach to imitation learning seeks a policy that surpasses the demonstrator's aspiration levels---defined over trajectories---on unseen demonstrations *without explicitly learning those aspirations*. We show experimentally that this focuses the policy to imitate higher quality demonstrations better than existing imitation learning methods, providing much higher rates of guaranteed acceptability to the demonstrator, and competitive true returns on a range of environments.
Submission Number: 55
Loading