Practical Shuffle Coding

Julius Kunze; Daniel Severo; Jan-Willem van de Meent; James Townsend

Practical Shuffle Coding

Julius Kunze, Daniel Severo, Jan-Willem van de Meent, James Townsend

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: graph compression, entropy coding, bits-back coding, lossless compression, generative models, information theory, probabilistic models, graph neural networks, multiset compression, asymmetric numeral systems, compression, entropy, shuffle coding

TL;DR: We present a general method for practical lossless compression of unordered data structures that achieves state-of-the-art rates and speeds on large graphs.

Abstract: We present a general method for lossless compression of unordered data structures, including multisets and graphs. It is a variant of shuffle coding that is many orders of magnitude faster than the original and enables 'one-shot' compression of single unordered objects. Our method achieves state-of-the-art compression rates on various large-scale network graphs at speeds of megabytes per second, efficiently handling even a multi-gigabyte plain graph with one billion edges. We release an implementation that can be easily adapted to different data types and statistical models.

Primary Area: Probabilistic methods (for example: variational inference, Gaussian processes)

Submission Number: 15389

Loading