Finite-State Autoregressive Entropy Coding for Efficient Learned Lossless Compression

Yufeng Zhang; Hang Yu; Jianguo Li; Weiyao Lin

Finite-State Autoregressive Entropy Coding for Efficient Learned Lossless Compression

Yufeng Zhang, Hang Yu, Jianguo Li, Weiyao Lin

Published: 16 Jan 2024, Last Modified: 10 Apr 2024ICLR 2024 spotlightEveryoneRevisionsBibTeX

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Lossless Compression, Autoregressive Model, Acceleration, Entropy Coding, Autoencoder

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: Learned lossless data compression has garnered significant attention recently due to its superior compression ratios compared to traditional compressors. However, the computational efficiency of these models jeopardizes their practicality. This paper proposes a novel system for improving the compression ratio while maintaining computational efficiency for learned lossless data compression. Our approach incorporates two essential innovations. First, we propose the Finite-State AutoRegressive (FSAR) entropy coder, an efficient autoregressive Markov model based entropy coder that utilizes a lookup table to expedite autoregressive entropy coding. Next, we present a Straight-Through Hardmax Quantization (STHQ) scheme to enhance the optimization of discrete latent space. Our experiments show that the proposed lossless compression method could improve the compression ratio by up to 6\% compared to the baseline, with negligible extra computational time. Our work provides valuable insights into enhancing the computational efficiency of learned lossless data compression, which can have practical applications in various fields. Code is available at https://github.com/alipay/Finite_State_Autoregressive_Entropy_Coding.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Primary Area: general machine learning (i.e., none of the above)

Submission Number: 1165

Loading