TL;DR: We present a self-tuning RL agent that learns to adjust the accuracy of the samples when the observation cost is an intrinsic part of the environment.
Keywords: missing values, reinforcement learning, sample accuracy, observation cost
Abstract: We consider a reinforcement learning (RL) setting where there is a cost associated with making accurate observations. We propose a reward shaping framework and present a self-tuning RL agent that learns to adjust the accuracy of the samples. We consider two different scenarios: In the first scenario, the agent directly varies the accuracy level of each sample. In the second scenario, the agent decides to perfectly observe some samples and miss others. In contrast to the existing work that focuses on sample efficiency during training, our focus is on the behavior of the agent when the observation cost is an intrinsic part of the environment. Our results illustrate that the RL agent can successfully learn that not all samples are equally informative and choose to observe the ones that are most critical for the task at hand with high accuracy.
2 Replies
Loading