NEURAL NETWORK COMPRESSION: THE FUNCTIONAL PERSPECTIVE

Israel Mason-Williams

NEURAL NETWORK COMPRESSION: THE FUNCTIONAL PERSPECTIVE

Israel Mason-Williams

Published: 05 Mar 2024, Last Modified: 12 May 2024PML4LRS PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Model Compression, Pruning, Quantization, Knowledge Distillation

TL;DR: Functional analysis of compression methods reveal that quantisation and pruning can be considered compression mechanisms, while knowledge distillation cannot.

Abstract: Compression techniques, such as Knowledge distillation, Pruning, and Quantization reduce the computational costs of model inference and enable on-edge machine learning. The efficacy of compression methods is often evaluated through the proxy of accuracy and loss to understand similarity of the compressed model. This study aims to explore the functional divergence between compressed and uncompressed models. The results indicate that Quantization and Pruning create models that are functionally similar to the original model. In contrast, Knowledge distillation creates models that do not functionally approximate their teacher models. The compressed model resembles the dissimilarity of function observed in independently trained models. Therefore, it is verified, via a functional understanding, that Knowledge distillation is not a compression method. Thus, leading to the definition of Knowledge distillation as a training regulariser given that no knowledge is distilled from a teacher to a student.

Submission Number: 16

Loading