Abstract: Video world models have attracted significant attention for their ability to produce high-fidelity future visual observations conditioned on past observations and navigation actions.
However, achieving temporally and spatially consistent generation over long horizons remains an open challenge: existing approaches either compress past frames at fixed rates based on temporal proximity, discarding spatially critical information, or retrieve only a handful of relevant frames without increasing the total amount of retained history.
In this paper, we propose WorldPack, a video world model that introduces spatially-aware compressed memory to address both limitations simultaneously.
The key insight is that compression rates should not be uniform or temporally determined, but should instead be dynamically allocated based on 3D spatial relevance to the current viewpoint.
WorldPack achieves this through two tightly coupled mechanisms: trajectory packing, which fits substantially more historical frames into a fixed-length context through hierarchical frame compression, and geometric selection, which leverages camera pose information and field-of-view
overlap to assign lower compression to spatially important frames and higher compression to less relevant ones.
Together, these mechanisms expand the effective context from 4 to 22 frames with only 16\% additional inference time, while preserving the most informative frames for spatial reasoning with high fidelity.
We evaluate WorldPack on LoopNav, a Minecraft benchmark for
long-horizon spatial consistency, and conduct comprehensive experiments on the RECON, real-world navigation dataset, across multiple evaluation protocols.
WorldPack consistently outperforms strong baselines---including Oasis, Mineworld, DIAMOND, NWM---with particularly pronounced gains in spatial reasoning tasks that require recall of distant observations.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Miguel_Ángel_Bautista1
Submission Number: 9377
Loading