BiViT: Exploring Binary Vision Transformers

Daniel Bolya; Sachit Kuhar; Judy Hoffman

BiViT: Exploring Binary Vision Transformers

Daniel Bolya, Sachit Kuhar, Judy Hoffman

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: quantization, binary quantization, vision transformer, distillation, classification, imagenet

TL;DR: We introduce BiViT, a the first fully binary quantized binary vision transformer with both 1-bit weights and activations.

Abstract: We introduce BiViT, a Binary Vision Transformer that tackles the extremely difficult problem of quantizing both the weights and activations of a ViT model to just 1 bit. Initially, we observe that the techniques used to binarize transformers in NLP don't work on Vision Transformers (ViTs). To address this, we introduce some simple yet critical architectural changes, improving 28% over a baseline binarized ViT. Then, we improve 11% over from-scratch training by employing our normalized BiViT distillation scheme, which we find to be crucial for dense distillation in vision. Overall, BiViT can achieve a 58x reduction in operations and a 20x compression in model size, while bringing top-1 accuracy on ImageNet-1k in line with similar benchmarks for binary transformers in NLP. We hope BiViT can be the first step toward even more powerful binary ViT models.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

4 Replies

Loading