Arbitrary-Shaped Image Generation via Spherical Neural Field Diffusion

Published: 26 Jan 2026, Last Modified: 28 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion Models, Image Generation, Spherical Neural Field
TL;DR: ASIG learns a spherical latent with diffusion and uses a spherical neural field to control FOV, viewpoint, and resolution, enabling high-quality perspective/panorama/fisheye/irregular synthesis from a single model.
Abstract: Existing diffusion models excel at generating diverse content, but remain confined to fixed image shapes and lack the ability to flexibly control spatial attributes such as viewpoint, field-of-view (FOV), and resolution. To fill this gap, we propose Arbitrary-Shaped Image Generation (ASIG), the first generative framework that enables precise spatial attribute control while supporting high-quality synthesis across diverse image shapes (e.g., perspective, panoramic, and fisheye). ASIG introduces two key innovations: (1) a mesh-based spherical latent diffusion to generate a complete scene representation, with seam enforcement denoising strategy to maintain semantic and spatial consistency across viewpoints; and (2) a spherical neural field to sample arbitrary regions from the scene representation with coordinate conditions, enabling distortion-free generation at flexible resolutions. To this end, ASIG enables precise control over spatial attributes within a unified framework, enabling high-quality generation across diverse image shapes. Experiments demonstrate clear improvements over prior methods specifically designed for individual shapes. Code is available at https://github.com/xjyjjy/ASIG.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 10528
Loading