Lightweight Vision Transformers for Low Energy Edge Inference

Published: 30 May 2024, Last Modified: 16 Jun 2024MLArchSys 2024 OralPosterEveryoneRevisionsBibTeXCC BY 4.0
Workshop Track: System for Machine Learning
Presentation: Virtual
Keywords: Vision Transformers, Edge, FPGA, Weightless Neural Networks, LUTs, Machine Learning
Presenter Full Name: Shashank Nag
TL;DR: Lightweight vision transformers that fuse aspects of weightless neural networks and vision transformers, achieving substantial energy savings
Presenter Email: shashanknag@utexas.edu
Abstract: Vision Transformer models have been performing increasingly well in recent times. However, their computational demands make them infeasible to be deployed on edge devices with latency and energy constraints. Weightless Neural Networks (WNNs) are look-up-table based models, that are different from conventional Deep Neural Networks, and offer a low-latency, low-energy alternative. In this work, we seek to combine aspects of vision transformers and weightless neural networks to design Lightweight Vision Transformers that are efficient for edge inferences - to strike a desirable trade-off between the hardware requirements of transformers and accuracy achieved. We analyze the I-ViT-T vision transformer variant to observe that roughly 57% of the computations are within the Multi Layer Perceptron (MLP) layers. We estimate the hardware savings in replacing these layers with our proposed weightless layers, and evaluate such models for accuracy. Preliminary results with the I-ViT-T model suggest that the weightless layers introduced in place of the MLP layers result in a significant speedup for a lower hardware resource requirement, as compared to a systolic array based accelerator implementation for the MLPs. When evaluated on end-to-end performance, this model variant offers a 2.9x drop in energy per inference over the baseline model – at the cost of about 6% drop in model accuracy on the CIFAR-10 dataset. We continue our efforts to improve the model accuracy and extend this work to larger transformer variants and benchmarks, while trying to optimize the hardware resource consumption.
Presenter Bio: Shashank Nag is a PhD Student in the Laboratory of Computer Architecture at The University of Texas at Austin. His research interests involve hardware-efficient ML and ML algorithm-hardware accelerator co-design.
Paper Checklist Guidelines: I certify that all co-authors have validated the presented results and conclusions, and have read and commit to adhering to the Paper Checklist Guidelines, Call for Papers and Publication Ethics.
YouTube Link: https://tinyurl.com/mlarchsys-lvt
Dataset Release: I certify that all co-authors commit to release the dataset and necessary scripts to reproduce the presented results.
Workshop Registration: Yes, at least one of the authors has registered for the workshop (Two-Day Registration at minimum).
Submission Number: 12
Loading