UniDiff: Spectral–Spatial Vision Models under Unified Diffusion

Jinchang Zhang; Jiakai Lin; xinrou Kang; Zijun Li; Guoyu Lu

UniDiff: Spectral–Spatial Vision Models under Unified Diffusion

Jinchang Zhang, Jiakai Lin, xinrou Kang, Zijun Li, Guoyu Lu

15 Sept 2025 (modified: 16 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Spectral, Spatial, Unified Diffusion

Abstract: We present UniDiff, a general-purpose vision backbone driven by a unified diffusion operator. Under a single diffusion-semigroup framework, we construct a parallel Spatial--Spectral Diffusion module: the spectral path applies a heat-kernel multiplier to achieve global low-pass homogenization, while the spatial path performs anisotropic diffusion to preserve boundaries. Both paths are conditioned on a shared latent control field that predicts their parameters, and we impose PDE residual and energy-consistency constraints (with boundary consistency as auxiliary) at the operator level to pin the two steps to the same physical clock, eliminating the smear--sharpen counteraction and time-scale drift. Compared with self-attention, UniDiff attains efficient and interpretable global modeling. On ImageNet-1K , UniDiff achieves {84.2/84.8/85} Top-1 accuracy at the Tiny/Small/Base scales. On COCO dataset object detection/instance segmentation and ADE20K dataset semantic segmentation, it also exhibits advantages in parameter number, FLOPs, and inference throughput over peer baselines. With computational complexity O(N logN) and linear memory, UniDiff provides a unified and controllable spectral--spatial modeling paradigm, delivering robust representations for classification, detection, and segmentation.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 5412

Loading