Keywords: DeepRL, Data augmentation, Scheduling, Generalization, Sample efficiency, Optimization
TL;DR: We apply data augmentation at task-dependent time, which is determined by benefit of data augmentation during training.
Abstract: We consider data augmentation technique to improve data efficiency and generalization performance of reinforcement learning (RL). Our empirical study on Open AI Procgen shows that the timing of augmentation is critical, and that to maximize test performance, an augmentation should be applied either during the entire RL training, or after the end of RL training. More specifically, if the regularization imposed by augmentation is helpful only in testing, then augmentation is best used after training than during training, because augmentation often disturbs the training process. Conversely, an augmentation that provides regularization that is useful in training should be used during the whole training period to fully utilize its benefit in terms of both generalization and data efficiency. Considering our findings, we propose a mechanism to fully exploit a set of augmentations, which automatically identifies the best augmentation (or no augmentation) in terms of RL training performance, and then utilizes all the augmentations by network distillation after training to maximize test performance. Our experiment empirically justifies the proposed method compared to other automatic augmentation mechanism.
Supplementary Material: zip