Multithreaded Layer-wise Training of Sparse Deep Neural Networks using Compressed Sparse Column

Mohammad Hasanzadeh-Mofrad, Rami G. Melhem, Muhammad Yousuf Ahmad, Mohammad Hammoud

Published: 2019, Last Modified: 08 Jan 2026HPEC 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Training a sparse Deep Neural Network (DNN) is inherently less memory-intensive and processor-intensive compared to training a dense (fully-connected) DNN. In this paper, we utilize Sparse Matrix-Matrix Multiplication (SpMM) to train sparsely-connected DNNs as opposed to dense matrix-matrix multiplication used for training dense DNNs. In our C/C++ implementation, we extensively use in-memory Compressed Sparse Column (CSC) data structures to store and traverse the neural network layers. Also, we train the neural network layer by layer, and within each layer we use 1D-Column partitioning to divide the computation required for training among threads. To speedup the computation, we apply the bias and activation functions while executing SpMM operations. We tested our implementation using benchmarks provided by MIT/IEEE/Amazon HPEC graph challenge [1]. Based on our results, our single thread (1 core) and multithreaded (12 cores) implementations are up to 22×, and 150× faster than the serial Matlab results provided by the challenge. We believe this speedup is due to the 1D-Column partitioning that we use to balance the computation of SpMM operations among computing threads, the efficient mechanism that we use for memory (re)allocation of sparse matrices, and the overlapping of the accumulation of SpMM results with the application of the bias and activation functions.

External IDs:dblp:conf/hpec/Hasanzadeh-Mofrad19