Keywords: Efficient ViT, Structural Reparameterization, FFN Acceleration
Abstract: We reveal that feed-forward network (FFN) layers significantly contribute to the latencies of Vision Transformers (ViTs). This effect scales up quickly as the model size escalates, and hence presents a major opportunity in efficiency optimization for ViTs via structural reparameterization on FFN layers. However, directly reparameterizing the linear projection weights is difficult due to the non-linear activation in between. In this work, we propose an innovative channel idle mechanism that establishes a linear pathway through the activation function, facilitating structural reparameterization on FFN layers during inference. Consequently, we present a family of efficient ViTs embedded with the introduced mechanism called **RePa**rameterizable Vision Trans**Formers** (RePaFormers). This technique brings remarkable latency reductions with small sacrifices (sometimes gains) in accuracy across various MetaFormer-structured architectures investigated in the experiments. The benefits of this method scale consistently with model sizes, demonstrating increasing efficiency improvements and narrowing performance gaps as model sizes grow. Specifically, the RePaFormer variants for DeiT-Base and Swin-Base achieve 67.5% and 49.7% throughput accelerations with minor changes in top-1 accuracy (-0.4% and -0.9%), respectively. Further improvements in speed and accuracy are expected on even larger ViT models. In particular, the RePaFormer variants for ViT-Large and ViT-Huge enjoy 66.8% and 68.7% inference speed-ups with +1.7% and +1.1% higher top-1 accuracies, respectively. RePaFormer is the first to employ structural reparameterization on FFN layers to expedite ViTs to our best knowledge, and we believe that it represents an auspicious direction for efficient ViTs. Codes are provided in the supplementary material.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1588
Loading