Abstract: Solving real-life sequential decision making problems under partial observability involves an exploration-exploitation problem. An agent needs to gather information about the state of the world for making rewarding decisions. However, in real-life, acquiring information is often highly costly, e.g., in the medical domain, information acquisition might correspond to performing a medical test on a patient. This poses a significant challenge for the agent to perform optimally for the task while reducing the cost for information acquisition. In this paper, we propose a model-based reinforcement learning framework that learns an active feature acquisition policy to solve the exploration-exploitation problem during its execution. Key to the success is a novel sequential variational auto-encoder. We demonstrate the efficacy of our proposed framework in a control domain as well as using a medical simulator, outperforming natural baselines and resulting in policies with greater cost efficiency.
1 Reply
Loading