DF-GAS: a Distributed FPGA-as-a-Service Architecture towards Billion-Scale Graph-based Approximate Nearest Neighbor SearchDownload PDFOpen Website

Published: 01 Jan 2023, Last Modified: 17 Mar 2024MICRO 2023Readers: Everyone
Abstract: Embedding retrieval is a crucial task for recommendation systems. Graph-based approximate nearest neighbor search (GANNS) is the most commonly used method for retrieval, and achieves the best performance on billion-scale datasets. Unfortunately, the existing CPU- and GPU-based GANNS systems are difficult to optimize the throughput under the latency constraints on billion-scale datasets, due to the underutilized local memory bandwidth (5-45%) and the expensive remote data access overhead (∼ 85% of the total latency). In this paper, we first introduce a practically ideal GANNS architecture for billion-scale datasets, which facilitates a detailed analysis of the challenges and characteristics of distributed GANNS systems. Then, at the architecture level, we propose DF-GAS, a Distributed FPGA-as-a-Service (FPaaS) architecture for accelerating billion-scale Graph-based Approximate nearest neighbor Search. DF-GAS uses a feature-packing memory access engine and a data prefetching and delayed processing scheme to increase local memory bandwidth by 36-42% and reduce remote data access overhead by 76.2%, respectively. At the system level, we exploit the “full-graph + sub-graph” hybrid parallel search scheme on distributed FPaaS system. It achieves million-level query-per-second with sub-millisecond latency on billion-scale GANNS for the first time. Extensive evaluations on million-scale and billion-scale datasets show that DF-GAS achieves an average of 55.4 ×, 32.2 ×, 5.4 ×, and 4.4 × better latency-bounded throughput than CPUs, GPUs, and two state-of-the-art ANNS architectures, i.e., ANNA [23] and Vstore [27], respectively.
0 Replies

Loading