Evolution Direction of Reward Appraisal in Reinforcement Learning Agents

Masaya Miyawaki, Koichi Moriyama, Atsuko Mutoh, Tohgoroh Matsui, Nobuhiro Inuzuka

Published: 2018, Last Modified: 13 Nov 2023KES-AMSTA 2018Readers: Everyone

Abstract: Humans appraise the environment in daily life. We are implementing appraisal mechanisms into reinforcement learning agents. One of such mechanisms we proposed is the utility-based Q-learning, which learns behaviors from subjective utilities derived from payoffs the agent gains and a utility-derivation function the agent has. In the previous work, we know that payoff-based evolution brings utility-derivation functions that facilitate mutual cooperation in iterated prisoner’s dilemma games. However, the evolution process itself has not yet been known well. In this work, we investigate the process in terms of what determines the evolution direction. We introduce two metrics showing preference of actions based on the evolved subjective utilities, which divide the evolution space into four regions. In each region, the metrics will explain the evolution directions.

0 Replies