PanoWorld-X: Generating Explorable Panoramic Worlds via Sphere-Aware Video Diffusion

Yuyang Yin; Hao-Xiang Guo; Fangfu Liu; Mengyu Wang; Hanwen Liang; Eric Li; Yikai Wang; Xiaojie Jin; Yao Zhao; Yunchao Wei

PanoWorld-X: Generating Explorable Panoramic Worlds via Sphere-Aware Video Diffusion

Yuyang Yin, Hao-Xiang Guo, Fangfu Liu, Mengyu Wang, Hanwen Liang, Eric Li, Yikai Wang, Xiaojie Jin, Yao Zhao, Yunchao Wei

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: immersive content

Abstract: Generating a complete and explorable 360-degree visual world enables a wide range of downstream applications. While prior works have advanced the field, they remain constrained by either narrow field-of-view limitations, which hinder the synthesis of continuous and holistic scenes, or insufficient controllability that restricts free exploration by users or autonomous agents. To address this, we propose PanoWorld-X, a novel framework for high-fidelity and controllable panoramic video generation with diverse camera trajectories. First, we propose a novel pipeline for synthesizing panoramic video-trajectory dataset pairs in virtual 3D environments via Unreal Engine. This pipeline consists of four main steps and enables the collection of a large-scale dataset with rich scene diversity and accurate trajectory annotations. To achieve precise panoramic video generation, we identify that the bottleneck arises from the misalignment between the spherical geometry of panoramic data and the inductive priors of conventional video diffusion models. To address this, we leverage the spherical connectivity characteristics of panorama data, and propose a Sphere-Aware Diffusion Transformer that reprojects equirectangular features onto the spherical surface, thereby capturing geometric adjacency in the latent space. This design significantly improves both visual fidelity and spatiotemporal continuity. Extensive experiments demonstrate that our PanoWorld-X achieves superior performance in various aspects, including motion range, control precision, and visual quality, underscoring its potential for real-world applications.

Supplementary Material: zip

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 11906

Loading