Keywords: Alignment, human-AI repeated interaction, bayesian persuasion
Abstract: Aligning AI systems with human values remains a fundamental challenge---but does our inability to create perfectly aligned models preclude obtaining the benefits of alignment? We study a strategic setting where a human user interacts with multiple differently misaligned AI agents, none of which are individually well-aligned. Our key insight is that when the user's utility function lies approximately within the convex hull of the AI agents' utility functions—a condition that becomes weaker as more diverse models become available—strategic competition among the agents can yield outcomes comparable to interacting with a perfectly aligned model.
We model this as a multi-leader Stackelberg game extending Bayesian persuasion to multi-round conversations between differently informed parties. We prove three main results of increasing generality: (1) When perfect alignment would allow the user to learn their Bayes-optimal action, she is also able to learn her Bayes-optimal action in all equilibria under our convex hull condition; (2) Under a weaker assumption requiring only approximate utility learning, a non-strategic user employing quantal response achieves near-optimal utility in all equilibria; (3) When the user selects the best single AI to interact with after an evaluation period, in equilibrium near-optimal utility is guaranteed without any additional distributional assumptions.
We complement our theory with two empirical studies on ethical judgments (ETHICS) and movie recommendations (MovieLens). Using 100 diverse LLM-based agents per domain to label each instance with utilities and fit non-negative linear and simplex (convex) combinations and evaluate the MSE of the best fit with respect to a ground-truth ``human'' utility. Across both domains, the best utility function in the convex hull of the LLM utilities achieves substantially lower alignment error (MSE to a ground-truth ``human'' utility) than the best single one does.
Submission Number: 66
Loading