Multi-Agent Reinforcement Learning for Efficient Vision Transformer with Dynamic Token Selection

Lu ChengLong; Wei Wang

Multi-Agent Reinforcement Learning for Efficient Vision Transformer with Dynamic Token Selection

Lu ChengLong, Wei Wang

27 Sept 2024 (modified: 15 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: efficient vision transformer, dynamic token selection, mappo

TL;DR: Reinforcement learning for efficient transformer with dynamic token selection.

Abstract: Vision Transformers (ViT) have revolutionized the field of computer vision by leveraging self-attention mechanisms to process images. However, the computational cost of ViT increases quadratically with the number of tokens. Dynamic token selection methods which aims to reduce computational cost by discard redundant tokens during inference, are primarily based on non-differentiable binary decisions methods and relaxations methods. However, Reinforcement Learning( (RL) based methods, which have astonishing decision-making ability, is considered to have high variance and high bias, not adopted for dynamic token selection task in previous work. Yet, RL-based methods have been successfully applied to many binary decision problems such as neural pruning, routing, path selection. In this paper, we propose Reinforcement Learning for Dynamic Vision Transformer (RL4DViT), a novel framework for the dynamic token selection task in ViT using RL. By harnessing the powerfull decision-making capabilities of Multi-Agent Reinforcement Learning(MARL) algorithms, our method dynamically prunes redundant tokens based on input complexity, significantly reducing the computational cost while maintaining high accuracy. Extensive experiments on the ImageNet dataset indicate that our approach reduces the computational cost by up to 39%, with only a 0.17% decrease in accuracy. To the best of our knowledge, this is the first RL-based token selection method for efficient ViT.

Primary Area: applications to computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10710

Loading