Abstract: The storage of multi-media data can benefit from the advancements in general-purpose lossless compression. The explosive growth of multi-media data volume in data centers demands a higher compression ratio and better compressors' run-time speed. However, recent deep-learning-based compressors with a high compression ratio usually build complicated dependencies on history symbols, leading to a long compression time. This paper investigates the behavior of historical symbols and finds an approximate order of importance. Namely, recent symbols have a substantially larger influence on the probability estimation of the next unknown symbol. This observation guides the designing of an interpretable structure for data compression, rather than learning implicitly from data like Recurrent Neural Network (RNN) and attention. Based on this observation, we disentangle the compression model into order learning and feature learning, which were fused in a large module in previous works. A parameterized ordered mask unit is established to learn the ordered importance of history symbols. A fast Multi-Layer Perceptron (MLP) network is designed for efficient feature learning. The proposed compressor can improve both compression performance and computational efficiency compared with transformer-based or RNN-based compressors. To further enhance computational efficiency, we propose a branch-MLP block to replace the original MLP layer. This block reduces the parameters and the FLOPs of the original MLP to a half, without sacrificing compression performance. Experiments on multi-media data demonstrate that our model improves the compression ratio by 10% on average across data domains while accelerating compression speed by 100% compared with the state-of-the-art. The source code and appendix are released at https://github.com/mynotwo/compressor_via_simple_and_scalable_parameterization.git.
0 Replies
Loading