Towards Exact Gradient-based Training on Analog In-memory Computing

Zhaoxian Wu; Tayfun Gokmen; Malte J. Rasch; Tianyi Chen

Towards Exact Gradient-based Training on Analog In-memory Computing

Zhaoxian Wu, Tayfun Gokmen, Malte J. Rasch, Tianyi Chen

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Analog AI; in-memory computing; stochastic gradient descent; stochastic optimization

TL;DR: Our paper establishes a theoretical foundation for model training on analog devices and shows a heuristic algorithm, Tiki-Taka, can converge to a critical point exactly.

Abstract: Given the high economic and environmental costs of using large vision or language models, analog in-memory accelerators present a promising solution for energy-efficient AI. While inference on analog accelerators has been studied recently, the training perspective is underexplored. Recent studies have shown that the "workhorse" of digital AI training - stochastic gradient descent (SGD) algorithm converges inexactly when applied to model training on non-ideal devices. This paper puts forth a theoretical foundation for gradient-based training on analog devices. We begin by characterizing the non-convergent issue of SGD, which is caused by the asymmetric updates on the analog devices. We then provide a lower bound of the asymptotic error to show that there is a fundamental performance limit of SGD-based analog training rather than an artifact of our analysis. To address this issue, we study a heuristic analog algorithm called Tiki-Taka that has recently exhibited superior empirical performance compared to SGD. We rigorously show its ability to converge to a critical point exactly and hence eliminate the asymptotic error. The simulations verify the correctness of the analyses.

Primary Area: Optimization (convex and non-convex, discrete, stochastic, robust)

Submission Number: 7919

Loading