Abstract: Autonomous agents in real-world environments may encounter undesirable outcomes or negative side effects (NSEs) when working collaboratively alongside other agents. We frame the challenge of minimizing NSEs in a multi-agent setting as a lexicographic decentralized Markov decision process in which we assume independence of rewards and transitions with respect to the primary assigned tasks, but allowing negative side effects to create a form of dependence among the agents. We present a lexicographic Q-learning approach to mitigate the NSEs using human feedback models while maintaining near-optimality with respect to the assigned tasks-up to some given slack. Our empirical evaluation across two domains demonstrates that our collaborative approach effectively mitigates NSEs, outperforming non-collaborative methods.
Loading