Abstract: AV1 is a video codec developed by leading technology companies to meet the increasing demands of modern video applications. Fractional Motion Estimation (FME), the focus of this work, is an important AV1 encoder tool. FME employs interpolation filters to generate sub-pixel predictions, thereby improving motion estimation accuracy. In AV1, FME uses sophisticated interpolation filters that can be combined in horizontal and vertical directions, with the optimal filter pair selected by the Interpolation Filter Search (IFS) process. The paper presents DM-FIFS, a dual-model, machine-learning-based approach designed to overcome prior limitations in filter prediction accuracy, which often led to suboptimal trade-offs between computational effort and coding efficiency. By splitting the decision space into two specialized models, DM-FIFS achieves more accurate filter predictions, thereby improving the balance between gains in computational effort and losses in coding efficiency compared to single-model approaches. The paper also presents a set of assessment and ablation experiments, a comprehensive discussion of key innovations in the AV1 encoder, and a detailed analysis of the interpolation filters used in AV1 FME. Experimental results show that DM-FIFS reduces IFS execution time by 51.40% with only a 0.11% increase in BD-BR, demonstrating a superior trade-off between computational effort and coding efficiency. To the best of our knowledge, DM-FIFS represents the most advanced machine-learning-based solution to reduce the computational complexity of AV1 IFS reported to date.
External IDs:doi:10.1109/tcsi.2025.3624335
Loading