Abstract: Sorting is a key part in database operators (like duplicate elimination, sort-merge joins and group-by aggregations). Sorting billions of records in a fast and energy efficient manner has become a key research challenge. In this work, we explore sorting in-memory using a parallel version of Radix Sort to build a high-performance hardware accelerator, called HARS (Hardware Accelerated Radix Sort). Our design enables dividing the unsorted dataset among parallel engines without the need for a merge step. HARS is implemented on Micron’s SB-852 FPGA board. The proposed accelerator provides high throughput in-memory sorting at a rate of 44 Million 128-bit records per second. HARS is 1.4x faster than CPU and 1.36x faster than GPU when GPU bandwidth is normalized. Projected performance of a proposed board with a more capable FPGA chip would yield 1.25x higher throughput.
Loading