Bitpaths: Compressing Datasets Without Decreasing Predictive Performance

Loren Nuyts, Laurens Devos, Wannes Meert, Jesse Davis

Published: 01 Jan 2022, Last Modified: 05 Aug 2024PKDD/ECML Workshops (1) 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The ever growing amount of data that becomes available necessitates more memory to store it. Machine learned models are becoming increasingly sophisticated and efficient in order to navigate this growing amount of data. However, not all data is relevant for a certain machine learning task and storing that irrelevant data is a waste of memory and power. To address this, we propose bitpaths: a novel pattern-based method to compress datasets using a random forest. During inference, a KNN classifier then uses the encoded training examples to make a prediction for the encoded test example. We empirically compare bitpaths’ predictive performance with the uncompressed setting. Our method can achieve compression ratios up to 80 for datasets with a large number of features without affecting the predictive performance.