- Keywords: Multi-bacth, batch reinfrocement learning, sample transfer
- Abstract: Reinforcement learning (RL), especially deep reinforcement learning, has achieved impressive performance on different control tasks. Unfortunately, most online reinforcement learning algorithms require a large number of interactions with the environment to learn a reliable control policy. This assumption of the availability of repeated interactions with the environment does not hold for many real-world applications due to safety concerns, the cost/inconvenience related to interactions, or the lack of an accurate simulator to enable effective sim2real training. As a consequence, there has been a surge in research addressing this issue, including batch reinforcement learning. Batch RL aims to learn a good control policy from a previously collected dataset. Most existing batch RL algorithms are designed for a single batch setting and assume that we have a large number of interaction samples in fixed data sets. These assumptions limit the use of batch RL algorithms in the real world. We use transfer learning to address this data efficiency challenge. This approach is evaluated on multiple continuous control tasks against several robust baselines. Compared with other batch RL algorithms, the methods described here can be used to deal with more general real-world scenarios.
- One-sentence Summary: We propose to use sample transfer and policy distallation to improve the take-level generation for bacth reinforcement leanring algorithms.