AD-VAT: An Asymmetric Dueling mechanism for learning Visual Active Tracking

Fangwei Zhong, Peng Sun, Wenhan Luo, Tingyun Yan, Yizhou Wang

Sep 27, 2018 ICLR 2019 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Visual Active Tracking (VAT) aims at following a target object by autonomously controlling the motion system of a tracker given visual observations. Previous work has shown a Single Agent Reinforcement Learning (SARL) approach, where the tracker can be trained in a simulator and performs tracking in real-world scenarios. However, during training, such SARL method requires manually specifying the moving path of the target object to be tracked, which can hurt the tracker's generalization on unseen object moving patterns. To learn a robust tracker for VAT, in this paper we propose a novel Multi-Agent RL (MARL) training method which adopts an Asymmetric Dueling mechanism, referred to as AD-VAT. In AD-VAT, both the tracker and the target are approximated by deep networks, and are trained via end-to-end RL in a dueling/competitive manner: i.e., the tracker intends to lockup the target, while the target tries to escape from the tracker. They are asymmetric in that the target is aware of the tracker, but not vice versa. Specifically, besides its own observation, the target is fed with the tracker's observation and action, and learns to predict the tracker's reward as an auxiliary task. We show that such an asymmetric dueling mechanism produces a stronger target, which in turn induces a more robust tracker. To stabilize the training, we also propose a novel partial zero-sum reward for the tracker/target. The experimental results, in both 2D and 3D environments, demonstrate that the proposed method leads to a faster convergence in training and yields more robust tracking behaviors in different testing scenarios. For supplementary videos, see: https://www.youtube.com/playlist?list=PL9rZj4Mea7wOZkdajK1TsprRg8iUf51BS
  • Keywords: Active tracking, reinforcement learning, adversarial learning, multi agent
  • TL;DR: We propose AD-VAT, where the tracker and the target object, viewed as two learnable agents, are opponents and can mutually enhance during training.
0 Replies

Loading