Efficient Asynchronize Stochastic Gradient Algorithm with Structured Data

Published: 06 Mar 2025, Last Modified: 24 Apr 2025FPI-ICLR2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Deep learning; Stochastic gradient
Abstract: Deep neural networks have demonstrated remarkable success across various domains, including computer vision, natural language processing, and bioinformatics. However, the increasing depth and complexity of these networks have led to significant computational and storage challenges. While prior research has addressed these issues through techniques such as network pruning and the use of high-dimensional data structures like locality-sensitive hashing (LSH) and space-partitioning trees, the computational cost per iteration during training remains linear in the data dimension $d$. In this work, we explore the potential of leveraging special structures in the input data to reduce this cost. Specifically, we consider input data points that can be represented as tensor products of lower-dimensional vectors, a common scenario in applications such as bioinformatics, click-through rate prediction, and computer vision. We present a novel stochastic gradient descent algorithm that, under mild assumptions on the input data structure, achieves a per-iteration training cost that is sublinear in the data dimension $d$. To the best of our knowledge, this is the first work to achieve such a result, marking a significant advancement in the efficiency of training deep neural networks. Our theoretical findings are supported by a formal theorem, demonstrating that the proposed algorithm can train a two-layer fully connected neural network with a per-iteration cost independent of $d$.
Submission Number: 38
Loading