Keywords: efficiency, var, frequency
Abstract: Recent studies on Visual Autoregressive (VAR) models have highlighted that high-frequency components, or later steps, in the generation process contribute disproportionately to inference latency. However, the underlying computational redundancy involved in these steps has yet to be thoroughly investigated. In this paper, we conduct an in-depth analysis of the VAR inference process and identify two primary sources of inefficiency: ***step redundancy*** and ***unconditional branch redundancy***. To address step redundancy, we propose an automatic step-skipping strategy that selectively omits unnecessary generation steps to improve efficiency. For unconditional branch redundancy, we observe that the information gap between the conditional and unconditional branches is minimal. Leveraging this insight, we introduce unconditional branch replacement, a technique that bypasses the unconditional branch to reduce computational cost. Notably, we observe that the effectiveness of acceleration strategies varies significantly across different samples. Motivated by this, we propose **SkipVAR**, a sample-adaptive framework that leverages frequency information to dynamically select the most suitable acceleration strategy for each instance. To evaluate the role of high-frequency information, we further introduce multiple high-variation benchmark datasets that evaluate the performance in terms of fine details. Extensive experiments show that SkipVAR achieves over 0.88 average SSIM with up to 1.81$\times$ overall acceleration and 2.62$\times$ lossless speedup on the GenEval benchmark.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 4650
Loading