Revisiting Random Generation Order: Ordinal-Biased Random Training for Efficient Visual Autoregressive Models
Keywords: visual auto regressive model
Abstract: We observe an interesting Ordinal Asymmetry phenomenon when training visual autoregressive (AR) generators with randomized generation paths: *early tokens*, due to limited context, suffer higher losses and primarily capture *global structure*, while *later tokens*, with richer context, incur lower losses and refine *local detail*.
This suggests that conventional randomized training must optimize two qualitatively different ordinal subproblems at once.
From a curriculum perspective, training can focus on one of the two ordinal subproblems instead of optimizing both simultaneously.
Therefore, we propose Ordinal-biased Random Training (ORT), a simple strategy that first biases loss weights toward later tokens or early tokens and then gradually anneals to uniform weighting, ensuring that both global structure and local detail are learned.
Specifically, we implement ORT with an *ordinal focal loss* that assigns position-dependent weights; the schedule is controllable and can emphasize either early or late tokens.
In practice, ORT shows a striking *sudden convergence*: gradient norms collapse sharply during the middle of randomized-path training, providing clear evidence that late-biased weighting accelerates early-stage optimization.
Experiments on ImageNet-256 with RAR validate our analysis: ORT halves the randomized training phase (200→100 epochs, $2\times$ faster) while maintaining FID comparable to the 400-epoch RAR-XL baseline.
Primary Area: generative models
Submission Number: 4080
Loading