Transformers at a Fraction

Published: 06 Nov 2024, Last Modified: 16 Nov 2024NLDL 2025 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: transformers, quaternion neural networks, LTH, pruning, parameter reduction
TL;DR: Reducing parameters of transformers by finding lottery tickets leveraging Quaternion weights without compromising on model performance
Abstract: Transformer-based large models, such as GPT, are known for their performance and ability to effectively address tasks. Transformer-based models often have many parameters, which are trained to achieve high-performance levels. As a result, they cannot be run locally on devices with smaller memory sizes, such as mobile phones, necessitating the use of these models remotely by sending the data to the cloud. This exposes us to privacy concerns over sending confidential data to the server, among others. In this work, we propose a method to make these large models easier to run on devices with much smaller memory while sacrificing little to no performance. We investigate quaternion neural networks, which can reduce the number of parameters to one-fourth of the original real-valued model when employed efficiently. Additionally, we explore sparse networks created by pruning weights as a method of parameter reduction, following the Lottery Ticket Hypothesis. We perform the experiments on vision and language tasks on their respective datasets. We observe that pruned quaternion models perform better than the real-valued models in severely sparse conditions.
Permission: pdf
Submission Number: 31
Loading