Abstract: Efficiently compressing HD/UHD content has long been challenging due to high bitrate costs. Instance-adaptive enhancement methods try to tackle this issue by compressing a video at reduced resolution and enhancing it using a neural model specifically overfitted for this video. However, existing methods focus solely on spatial super-resolution (SR) and under-utilize the videos’ temporal redundancy. Their limited management of the model’s updated parameters also causes excessive overfitting overheads. Therefore, this paper introduces IASTE, the first instance-adaptive enhancement method based on spatial-temporal enhancement (STE), and incorporates low-rank adaptation (LoRA) for efficient model overfitting. Specifically, we downscale videos spatially and temporally to reduce the data volume and achieve efficient video compression. Then, we overfit a specific STE model for each video and use it to enhance the decoded video’s spatiotemporal resolution. Leveraging the video swin transformer’s strong capability in capturing spatiotemporal correlations, we design a lightweight and efficient model to implement video STE. The model is overfitted for each video using LoRA. By freezing the pre-trained model and selectively updating a few low-rank matrices, the bitrate overhead for model storage can be mitigated. Experiments prove that compared to directly compressing high-frame-rate (HFR), high-resolution (HR) videos, our method achieves around 30% BD-Rate gains on the CTC and UVG datasets, about 15% gains on the YoutubeUGC dataset, and about 10% gains on the ultra-long videos in the Xiph dataset.
External IDs:doi:10.1109/tip.2025.3602648
Loading