Abstract: This work addresses the fidelity problem in low-bitrate real-time semantic video communication, which is crucial for enhancing the user experience. We present I2V-SC, extending baseline diffusion-based Image2Video (I2V) synthesis with ControlNet and distillation tailored to semantic video coding, enabling real-time performance with high fidelity. Evaluations using perceptual, pixel-level, motion, and semantic metrics demonstrate that I2V-SC outperforms baseline I2V and a baseline semantic video communication approach, namely CVSC, in ultra-low-bitrate ($<0.006$ bpp) and real-time-enabled settings. Subjective evaluations confirm that I2V-SC further improves the user QoE in terms of overall preferability.
External IDs:dblp:conf/ism/EtekeGKS25
Loading