Near-Data Processing for Machine Learning

Hyeokjun Choe, Seil Lee, Hyunha Nam, Seongsik Park, Seijoon Kim, Eui-Young Chung, Sungroh Yoon

Nov 05, 2016 (modified: Nov 06, 2016) ICLR 2017 conference submission readers: everyone
  • Abstract: In computer architecture, near-data processing (NDP) refers to augmenting the memory or the storage with processing power so that it can process the data stored therein. By offloading the computational burden of CPU and saving the need for transferring raw data in its entirety, NDP exhibits a great potential for acceleration and power reduction. Despite this potential, specific research activities on NDP have witnessed only limited success until recently, often owing to performance mismatches between logic and memory process technologies that put a limit on the processing capability of memory. Recently, there have been two major changes in the game, igniting the resurgence of NDP with renewed interest. The first is the success of machine learning (ML), which often demands a great deal of computation for training, requiring frequent transfers of big data. The second is the advent of NAND flash-based solid-state drives (SSDs) containing multicore processors that can accommodate extra computation for data processing. Sparked by these application needs and technological support, we evaluate the potential of NDP for ML using a new SSD platform that allows us to simulate in-storage processing (ISP) of ML workloads. Our platform (named ISP-ML) is a full-fledged simulator of a realistic multi-channel SSD that can execute various ML algorithms using the data stored in the SSD. For thorough performance analysis and in-depth comparison with alternatives, we focus on a specific algorithm: stochastic gradient decent (SGD), which is the de facto standard for training differentiable learning machines including deep neural networks. We implement and compare three variants of SGD (synchronous, Downpour, and elastic averaging) using ISP-ML, exploiting the multiple NAND channels for parallelizing SGD. In addition, we compare the performance of ISP and that of conventional in-host processing, revealing the advantages of ISP. Based on the advantages and limitations identified through our experiments, we further discuss directions for future research on ISP for accelerating ML.
  • Conflicts:,