Accelerating Attention Based Models via HW-SW Co-Design using Fine-Grained SparsificationDownload PDF

Published: 16 May 2023, Last Modified: 23 Jun 2023ASSYST OralReaders: Everyone
Keywords: Attention Models, Sparsity, Model Training Recipe, Microarchitecture, N:M Sparsity
TL;DR: New accerator and Training Technique for N:M structured sparsity.
Abstract: This paper proposes FIne-Grained Sparsification (FIGS), a novel architecture for accelerating attention-based models using N:M structured sparsity. Existing hardware accelerators focus on optimizing compute to achieve ideal processing element (PE) utilization but ignore the implications of higher input bandwidth. FIGS overcomes this challenge by leveraging techniques like grouping and reusing input data to reduce required input bandwidth, achieving high PE utilization while minimizing on-chip interconnect area. The paper also proposes FIGS-Train, a sparsity training recipe that improves the accuracy of N:M structured sparse attention models.
Workshop Track: ASSYST
Presentation: In-Person
Presenter Full Name: Abhimanyu Bambhaniya
Presenter Email: abambhaniya3@gatech.edu
Presenter Bio: Abhimanyu is a Ph.D. student at Georgia Tech. His research focuses on efficient accelerators and NoCs for AI and related emerging applications, with a special focus on accelerating attention-based models. He is also highly interested in deep learning algorithms, and sparsity in DL.
Slides: pdf
3 Replies

Loading