Machine Perceptual Quality: Evaluating the Impact of Severe Lossy Compression on Audio and Image Models

Dan Jacobellis, Daniel Cummings, Neeraja J. Yadwadkar

Published: 01 Jan 2024, Last Modified: 05 Mar 2025DCC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We evaluate various perception models—–including image classification, segmentation, speech recognition, and music source separation—–under severe lossy compression. Figure 1 summarizes the results underlining our insights. The datasets in the top row originally use near-lossless quality levels (ratios of about 5:1), while those in the bottom row are lossless. We apply additional compression to these six datasets using conventional, neural, and generative codecs, resulting in ratios between 20:1 and 1000:1. Our results indicate three key findings: (1) across nearly all tasks, generative compression methods like HiFiC and EnCodec provide the best performance despite having the lowest bitrates; (2) downstream performance correlates strongly with deep similarity metrics like LPIPS; and (3) Using lossy compressed datasets like ImageNet for pre-training can lead to counter-intuitive scenarios where severe lossy compression improves performance rather than degrading it. Our results provide a basis for integrating more potent compression into perception systems. Our code and experiments are available at: https://github.com/danjacobellis/MPQ .