Abstract: This paper develops a complete framework for perceptual video coding with anisotropic diffusion-based abstraction and completion under the spatio-temporal variation regularity. A soft clustering method is applied to retrieve transferable semantic implications of sampled frames in a sparser way. The restoration inference process as a learning equivalent optimization problem from a given set of sparse data, serves as non-parametric or exemplar-based sampling method by taking 3-D spatio-temporal similarity into consideration. Aside from pixel-wise color-based matching, two patch-based similarity metrics on motion-based pixel-wise similarity and semantic coherence are introduced for the exemplar-based reconstruction under complex situations. For each pixel in the abstracted frames, a set of matched patches will provide inference and its color is predicted with multi-hypothesis via weighted averaging. A pixel-wise confidence map based on spatio-temporal feature is also provided for important point selection so as to reduce the computation cost to an acceptable level. We validate both compression efficiency and restoration performance from coding gain, SSIM, optical flow consistency, and just-noticeable distortion (JND) on a variety of sources.
0 Replies
Loading