Keywords: Direct Latent-Space Classification, Variational Autoencoder (VAE), Learned Image Compression, Hyperprior Side-Information, High-Throughput Screening, Representation Learning
TL;DR: This empirical analysis demonstrates that fusing hyperprior side-information in latent-space classification decreases accuracy and increases latency. Primary latents are semantically saturated, making fusion redundant.
Abstract: Executing computer vision tasks directly within the compressed latent space of variational autoencoders (VAEs) offers significant computational advantages by bypassing the decompression bottleneck. In this paper, we investigate the semantic utility of hierarchical hyperpriors—traditionally used for spatial entropy estimation—as a side-information gating mechanism for direct latent-space classification. Utilizing a balanced 100,000-image subset of the AGAR microbial dataset, we demonstrate that a baseline Latent-ResNet operating strictly on primary latents achieves a mean Top-1 accuracy of 96.32\%, closely trailing a pixel-space EfficientNet-B0 (97.13\%). Contrary to theoretical intuition, our proposed Fusion-Gated Hyperprior architecture yields a slight performance degradation (95.57\%) alongside increased total system latency. This empirical ablation study suggests that at the specific compression fidelity of Quality Level 3, primary latent representations are semantically saturated for structural classification tasks, rendering hyperprior variance data redundant and mildly noisy. These findings provide bounded system-design parameters for deploying latency-optimized inference pipelines on pre-compressed data arrays.
Submission Number: 100
Loading