Keywords: Lightweight Networks, Efficient Networks, Vision Transformers, Classification
Abstract: We present a new family of mobile hybrid vision networks, called iFormer, with a
focus on optimizing latency and accuracy on mobile applications. iFormer effectively
integrates the fast local representation capacity of convolution with the efficient
global modeling ability of self-attention. The local interactions are derived
from transforming a standard convolutional network, i.e., ConvNeXt, to design a
more lightweight mobile network. Our newly introduced mobile modulation attention
removes memory-intensive operations in MHA and employs an efficient
modulation mechanism to boost dynamic global representational capacity. We
conduct comprehensive experiments demonstrating that iFormer outperforms existing
lightweight networks across various tasks. Notably, iFormer achieves an
impressive Top-1 accuracy of 80.4% on ImageNet-1k with a latency of only 1.10
ms on an iPhone 13, surpassing the recently proposed MobileNetV4 under similar
latency constraints. Additionally, our method shows significant improvements in
downstream tasks, including COCO object detection, instance segmentation, and
ADE20k semantic segmentation, while still maintaining low latency on mobile
devices for high-resolution inputs in these scenarios. The source code and trained
models will be available soon.
Supplementary Material: pdf
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 938
Loading