Abstract: Scan is a crucial operation in main-memory column-stores. It scans a column and returns a result bit vector indicating which records satisfy a filter predicate. ByteSlice is an in-memory data layout that chops data into multiple bytes and exploits early-stop capability by high-order bytes comparisons. As column widths are usually not multiples of byte, the last-byte of ByteSlice is padded with 0's, wasting memory bandwidth and computation power. To fully leverage the resources, we propose to weave a secondary index into the vacant bits (i.e., bits originally padded with 0's), forming our new layout coined DIFusion (Data Index Fusion). DIFusion enables skip-scan, a new fast scan that inherits the early-stopping capability from ByteSlice and at the same time possesses the data-skipping ability of index with zero space overhead. Empirical results show that skip-scan on DIFusion outperforms scan on ByteSlice.
Loading