Abstract: Humans appraise the environment in daily life. We are implementing appraisal mechanisms into reinforcement learning agents. One of such mechanisms we proposed is the utility-based Q-learning, which learns behaviors from subjective utilities derived from payoffs the agent gains and a utility-derivation function the agent has. In the previous work, we know that payoff-based evolution brings utility-derivation functions that facilitate mutual cooperation in iterated prisoner’s dilemma games. However, the evolution process itself has not yet been known well. In this work, we investigate the process in terms of what determines the evolution direction. We introduce two metrics showing preference of actions based on the evolved subjective utilities, which divide the evolution space into four regions. In each region, the metrics will explain the evolution directions.
0 Replies
Loading