Keywords: deep learning, reinforcement learning, machine learning, computer vision
TL;DR: We uncover implicit reinforcement learning properties within one of the best models of supervised transformer-based object detection, and conclude a novel method to enhance object detection performance by 0.3AP.
Abstract: We identify the presence of exploration and exploitation dilemma during the training of one of the best models of supervised transformer-based object detection, DINO. To tackle this challenge, we propose a new approach to integrate reinforcement learning into supervised learning. Specifically, we apply the $\varepsilon\$-greedy technique directly to the query selection process in DINO, without heavily modifying the original training process. This approach, which involves only a few lines of code, results in a noteworthy performance enhancement of 0.3 AP in the standard configuration with 6 layers of encoder/decoder, 4 scales, and 36 epochs, as well as a large margin of 1.8 AP improvement in the configuration with 2 layers of encoder/decoder, 4 scales, and 12 epochs. We attribute these improvements to the implicit reinforcement learning properties inherent within design of DINO. To substantiate this assertion, we illustrate the presence of implicit reinforcement properties within supervised learning by framing the problem of box proposal as a multi-armed bandit problem. To demonstrate its viability, we transform Monte Carlo policy gradient control of multi-armed bandit problem into a supervised learning form through a series of deductive steps. Furthermore, we establish an experimental support for our findings by visualizing the improvements achieved through the $\varepsilon\$-greedy approach.
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 799
Loading