Fast Multi-Column Sorting in Main-Memory Column-Stores

Published: 2016, Last Modified: 09 Nov 2024SIGMOD Conference 2016EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Sorting is a crucial operation that could be used to implement SQL operators such as GROUP BY, ORDER BY, and SQL:2003 PARTITION BY. Queries with multiple attributes in those clauses are common in real workloads. When executing queries of that kind, state-of-the-art main-memory column-stores require one round of sorting per input column. With the advent of recent fast scans and denormalization techniques, that kind of multi-column sorting could become a bottleneck. In this paper, we propose a new technique called "code massaging", which manipulates the bits across the columns so that the overall sorting time can be reduced by eliminating some rounds of sorting and/or by improving the degree of SIMD data level parallelism. Empirical results show that a main-memory column-store with code massaging can achieve speedup of up to 4.7X, 4.7X, 4X, and 3.2X on TPC-H, TPC-H skew, TPC-DS, and real workload, respectively.
Loading