High-Fidelity Semantic Video Communication with Controllable Image-To-Video Diffusion Models

Cem Eteke, Alexander Griessel, Wolfgang Kellerer, Eckehard G. Steinbach

Published: 2025, Last Modified: 26 Feb 2026ISM 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This work addresses the fidelity problem in low-bitrate real-time semantic video communication, which is crucial for enhancing the user experience. We present I2V-SC, extending baseline diffusion-based Image2Video (I2V) synthesis with ControlNet and distillation tailored to semantic video coding, enabling real-time performance with high fidelity. Evaluations using perceptual, pixel-level, motion, and semantic metrics demonstrate that I2V-SC outperforms baseline I2V and a baseline semantic video communication approach, namely CVSC, in ultra-low-bitrate ($<0.006$ bpp) and real-time-enabled settings. Subjective evaluations confirm that I2V-SC further improves the user QoE in terms of overall preferability.

External IDs:dblp:conf/ism/EtekeGKS25