Hardware acceleration for similarity measurement in natural language processing

Prateek Tandon; Vahed Qazvinian; Jichuan Chang; Parthasarathy Ranganathan; Ronald G. Dreslinski; Thomas F. Wenisch

Hardware acceleration for similarity measurement in natural language processing

Prateek Tandon, Vahed Qazvinian, Jichuan Chang, Parthasarathy Ranganathan, Ronald G. Dreslinski, Thomas F. Wenisch

Published: 01 Jan 2013, Last Modified: 20 May 2025ISLPED 2013EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The continuation of Moore's law scaling, but in the absence of Dennard scaling, motivates an emphasis on energy-efficient accelerator-based designs for future applications. In natural language processing, the conventional approach to automatically analyze vast text collections - using scale-out processing - incurs high energy and hardware costs since the central compute-intensive step of similarity measurement often entails pairwise, all-to-all comparisons. We propose a custom hardware accelerator for similarity measures that leverages data streaming, memory latency hiding, and parallel computation across variable-length threads. We evaluate our design through a combination of architectural simulation and RTL synthesis. When executing the dominant kernel in a semantic indexing application for documents, we demonstrate throughput gains of up to 42× and 58× lower energy per similarity-computation compared to an optimized software implementation, while requiring less than 1.3% of the area of a conventional core.

Loading