Abstract: AI objectives are often hard to specify properly. Some approaches tackle this
problem by regularizing the AI’s side effects: Agents must weigh off “how much
of a mess they make” with an imperfectly specified proxy objective. We propose a
formal criterion for side effect regularization via the assistance game framework
[Shah et al., 2021]. In these games, the agent solves a partially observable Markov
decision process (POMDP) representing its uncertainty about the objective function
it should optimize. We consider the setting where the true objective is revealed
to the agent at a later time step. We show that this POMDP is solved by trading
off the proxy reward with the agent’s ability to achieve a range of future tasks.
We empirically demonstrate the reasonableness of our problem formalization via
ground-truth evaluation in two gridworld environments.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/formalizing-the-problem-of-side-effect/code)
1 Reply
Loading