Leveraging Sparse Input and Sparse Models: Efficient Distributed Learning in Resource-Constrained Environments

Emmanouil Kariotakis; Grigorios Tsagkatakis; Panagiotis Tsakalides; Anastasios Kyrillidis

Leveraging Sparse Input and Sparse Models: Efficient Distributed Learning in Resource-Constrained Environments

Emmanouil Kariotakis, Grigorios Tsagkatakis, Panagiotis Tsakalides, Anastasios Kyrillidis

Published: 20 Nov 2023, Last Modified: 02 Dec 2023CPAL 2024 (Proceedings Track) OralEveryoneRevisionsBibTeX

Keywords: sparse neural network training, efficient training

TL;DR: Design and study of a system that leverages sparsity across input and intermediate layers, of a neural network that gets trained and operates in a distributed manner by resource-constrained workers.

Abstract: Optimizing for reduced computational and bandwidth resources enables model training in less-than-ideal environments and paves the way for practical and accessible AI solutions. This work is about the study and design of a system that exploits sparsity in the input layer and intermediate layers of a neural network. Further, the system gets trained and operates in a distributed manner. Focusing on image classification tasks, our system efficiently utilizes reduced portions of the input image data. By exploiting transfer learning techniques, it employs a pre-trained feature extractor, with the encoded representations being subsequently introduced into selected subnets of the system's final classification module, adopting the Independent Subnetwork Training (IST) algorithm. This way, the input and subsequent feedforward layers are trained via sparse ``actions'', where input and intermediate features are subsampled and propagated in the forward layers. We conduct experiments on several benchmark datasets, including CIFAR-$10$, NWPU-RESISC$45$, and the Aerial Image dataset. The results consistently showcase appealing accuracy despite sparsity: it is surprising that, empirically, there are cases where fixed masks could potentially outperform random masks and that the model achieves comparable or even superior accuracy with only a fraction ($50\%$ or less) of the original image, making it particularly relevant in bandwidth-constrained scenarios. This further highlights the robustness of learned features extracted by ViT, offering the potential for parsimonious image data representation with sparse models in distributed learning.

Track Confirmation: Yes, I am submitting to the proceeding track.

Submission Number: 46

Loading