Keywords: efficient vision transformer, dynamic token selection, mappo
TL;DR: Reinforcement learning for efficient transformer with dynamic token selection.
Abstract: Vision Transformers (ViT) have revolutionized the field of computer vision by
leveraging self-attention mechanisms to process images. However, the computational
cost of ViT increases quadratically with the number of tokens. Dynamic token selection methods which aims to reduce computational cost by discard redundant tokens during inference, are primarily based on non-differentiable binary decisions methods and relaxations methods. However, Reinforcement Learning( (RL) based methods, which have astonishing decision-making ability, is considered to have high variance and high bias, not adopted for dynamic token selection task in previous work. Yet, RL-based methods have been successfully applied to many binary decision problems such as neural pruning, routing, path selection. In this paper, we propose Reinforcement Learning for Dynamic Vision Transformer (RL4DViT), a novel framework for the dynamic token selection task in ViT using RL. By harnessing the powerfull decision-making capabilities of Multi-Agent Reinforcement Learning(MARL) algorithms, our method dynamically prunes redundant tokens based on input complexity, significantly
reducing the computational cost while maintaining high accuracy. Extensive experiments
on the ImageNet dataset indicate that our approach reduces the computational cost by
up to 39%, with only a 0.17% decrease in accuracy. To the best of our knowledge,
this is the first RL-based token selection method for efficient ViT.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10710
Loading