- Keywords: Representation Learning, Reinforcement Learning, Active Learning
- Abstract: Solving real-life sequential decision making problems under partial observability involves an exploration-exploitation problem. To be successful, an agent needs to efficiently gather valuable information about the state of the world for making rewarding decisions. However, in real-life, acquiring valuable information is often highly costly, e.g., in the medical domain, information acquisition might correspond to performing a medical test on a patient. Thus it poses a significant challenge for the agent to learn optimal task policy while efficiently reducing the cost for information acquisition. In this paper, we introduce a model-based framework to solve such exploration-exploitation problem during its execution. Key to the success is a sequential variational auto-encoder which could learn high-quality representations over the partially observed/missing features, where such representation learning serves as a prime factor to drive efficient policy training under the cost-sensitive setting. We demonstrate our proposed method could significantly outperform conventional approaches in a control domain as well as using a medical simulator.
- One-sentence Summary: A novel active learning method for sequential decision making.
- Supplementary Material: zip