Semantic Image Compression using Textual Transforms

Published: 15 Apr 2024, Last Modified: 04 May 2024Learn to Compress @ ISIT 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: semantic compression, image compression, textual transform, image captioning, image reconstruction
TL;DR: Textual transform compression can outperform standard lossy compression techniques in rate-semantic distortion analysis; AI-based pipelines have the potential to vastly outperform those with human captions both in rate and in simple practicality.
Abstract: Image textual transforms can be much smaller than JPEGs with comparable degrees of semantic similarity to the original image. Using human semantic satisfaction scores, we demonstrate that the highest-performing textual transforms are often rated similar to JPEGs at both lower (80\% by size) and higher (90\% by size) degrees of compression, though the captions are orders of magnitude smaller than the smallest JPEG. AI-based captioners are competitive with humans in textual transform rate-semantic distortion tradeoffs. We compare human captions to those of two AI models (BLIP and GPT-4), accounting for human perceptions of specific semantic content and affect elements in the original and reconstructed images. GPT-4 captions are shorter on average than human captions, and also capture similar semantic elements and achieve similar semantic fidelity to the original image. Our results recommend textual transforms as a semantic compression method with better rate-semantic distortion performance than traditional methods. We look forward to specialized semantic loss functions to optimize end-to-end image captioning and reconstruction models.
Submission Number: 9
Loading