BiViT: Exploring Binary Vision TransformersDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: quantization, binary quantization, vision transformer, distillation, classification, imagenet
TL;DR: We introduce BiViT, a the first fully binary quantized binary vision transformer with both 1-bit weights and activations.
Abstract: We introduce BiViT, a Binary Vision Transformer that tackles the extremely difficult problem of quantizing both the weights and activations of a ViT model to just 1 bit. Initially, we observe that the techniques used to binarize transformers in NLP don't work on Vision Transformers (ViTs). To address this, we introduce some simple yet critical architectural changes, improving 28% over a baseline binarized ViT. Then, we improve 11% over from-scratch training by employing our normalized BiViT distillation scheme, which we find to be crucial for dense distillation in vision. Overall, BiViT can achieve a 58x reduction in operations and a 20x compression in model size, while bringing top-1 accuracy on ImageNet-1k in line with similar benchmarks for binary transformers in NLP. We hope BiViT can be the first step toward even more powerful binary ViT models.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
4 Replies

Loading