Tractable Agreement Protocols

Published: 10 Oct 2024, Last Modified: 15 Nov 2024Pluralistic-Alignment 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: agreement protocols, human-AI interaction, calibration, trustworthy ML
TL;DR: We give a reduction to construct accuracy-preserving human-AI agreement protocols through calibration, an algorithmically enforceable condition on predictions.
Abstract: We give an efficient reduction through which any machine learning algorithm can be converted into an interactive protocol that can interact with another party (such as a human) to reach agreement on predictions and improve accuracy. The requirements on each party are calibration conditions which are computationally and statistically tractable relaxations of Bayesian rationality --- that are sensible even in prior free settings --- and hence are a substantial generalization of Aumann's classic ``agreement theorem'' \cite{aumann1976}. In the interactive protocol, the machine learning model first produces a prediction. Then, the human responds to the model's prediction by either conveying agreement, or else providing feedback of some sort. The model then updates its state and provides a new prediction, and the human in turn may update their beliefs. The process continues until the model and the human reach agreement. The first setting we study generalizes past work on Aumann's Agreement Theorem, in which the parties aim to agree on a one-dimensional expectation. At each round, each party simply communicates an estimate of their current prediction for the expectation. In this setting we recover the quantitative convergence theorem of Aaronson \cite{aaronson2004complexity} (but under our much weaker assumptions). We then move on to the case in which the parties maintain beliefs about a distribution over $d$ outcomes and consider two feedback mechanisms. The first simply corresponds to a vector-valued estimate of the agents' current prediction. The second takes a decision theoretic perspective: if the human needs to take some downstream action from a finite set, and has an arbitrary utility function of their action and the outcome, then we show that the parties can communicate and reach agreement about the correct downstream action to take by simply communicating at each round the action that they believe to be utility maximizing. The number of rounds until agreement remains independent of $d$ in this case. We can also generalize our protocols to more than 2 parties, with computational complexity that degrades only linearly with the number of parties. Our protocols are based on simple, efficiently maintainable conditions and result in predictions that are more accurate than any single party's alone.
Submission Number: 25
Loading