Keywords: Lossless Semantic Compression, Vector Quantization, Representation Learning
TL;DR: We introduce SOLO-VQ, a vector-quantization method that achieves no loss in downstream task performance while reaching the information-theoretic rate lower bound on controlled environments.
Abstract: Is it possible to derive an optimally compact image representation that preserves semantic information without performance loss for a class of downstream tasks? This paper addresses this fundamental question by providing a formal definition of semantic lossless optimal compression. We introduce a framework called Semantic Optimal Lossless Vector Quantization (SOLO-VQ) as a practical realization to address this concept. Unlike prior works, which often rely on heuristics and evaluate on generic image datasets where optimality is unverifiable, we propose a novel evaluation protocol. We construct a series of synthetic datasets and associated tasks where the information-theoretic rate limits for lossless compression are computable. Within these controlled environments, we empirically demonstrate that SOLO-VQ achieves provably optimal and lossless compression, effectively reaching the theoretical lower bounds. Our work establishes a principled foundation for goal-oriented semantic media data compression and suggests a promising methodology towards achieving this goal for compressive real-world image transmission.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 15730
Loading