Keywords: AI retraining, repeated strategic interaction, bandit feedback
Abstract: Modern AI systems are increasingly retrained on data generated through interaction with users. Three forces are at play:
(i) the users who strategically adapt their behavior,
(ii) a prompting interface which obscures user intent,
and (iii) the fact that AI is typically retrained "greedily," ignoring exploration-exploitation tradeoffs. We ask whether these dynamics lead to poor outcomes. We study a stylized model, focusing on the "nice" case when the AI and the users have aligned incentives.
We identify two distinct failure modes. First, the system may fail to converge to an optimal Nash equilibrium (of the relevant stage game) due to limited exploration, instead stabilizing at a suboptimal outcome region. This mode is ubiquitous: it happens with a positive probability for \emph{every} problem instance. Second, a non-degenerate subset of problem instances exhibit \emph{model deterioration}, whereby the system converges to an outcome that is strictly worse than the initial state.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Paper Type: Standard paper
Submission Number: 70
Loading