Abstract: The interaction between an artificial agent and its
environment is bi-directional. The agent extracts
relevant information from the environment, and
affects the environment by its actions in return to
accumulate high expected reward. Standard reinforcement learning (RL) deals with the expected
reward maximization. However, there are always
information-theoretic limitations that restrict the
expected reward, which are not properly considered by the standard RL.
In this work we consider RL objectives with
information-theoretic limitations. For the first
time we derive a Bellman-type recursive equation for the causal information between the environment and the agent, which is combined plausibly with the Bellman recursion for the value
function. The unified equitation serves to explore
the typical behavior of artificial agents in an infinite time horizon.
0 Replies
Loading