Geometric Inductive Priors in Diffusion-Based Optical Flow Estimation

Published: 09 Jul 2025, Last Modified: 09 Jul 2025BEW 2025 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: diffusion models, geometric deep learning, hypercomplex neural networks, optical flow estimation, Clifford algebra, geometric algebra
TL;DR: GA-DDVM enhances optical flow estimation by integrating a geometric prior through Geometric Algebra, restricting the model to 2D vectors and focused operations like scaling and rotations for faster converging diffusion models with minimal complexity.
Abstract: Diffusion models are ubiquitous in generative modeling and their prevalence in structured prediction tasks is increasing. The denoising diffusion vision model (DDVM), for example, achieves state-of-the-art accuracy on tasks such as monocular depth and optical flow estimation. We introduce GA-DDVM, a modified version of DDVM working in Geometric Algebra (GA) that includes a geometric prior to constrain diffusion for faster and more accurate optical flow estimation. We constrain diffusion in two key ways: (i) we restrict the types of objects learned by the pipeline to 2D vector fields, (i.e., optical flows), and (ii) we limit the operations performed by the network layers on these objects to scaling and rotations. GA-DDVM demonstrates substantial improvements over the baseline DDVM that emerge early in training and persist across all checkpoints: at 600k training steps, GA-DDVM reduces the endpoint error (EPE) on the KITTI dataset by 76. 3\% and reduces the KITTI Fl-all metric from 76.8\% to 20.1\%. The Sintel-clean error and Sintel-final errors similarly drop from 11.4 to 3.38, and from 11.7 to 4.46, respectively. By embedding geometric structure directly into the diffusion process, GA-DDVM shows that incorporating domain priors into generative models can yield substantially faster convergence with minimal additional complexity in network architecture. This opens up promising directions for structured prediction tasks across domains where geometric constraints are inherent.
Track: Full paper (8 pages excluding references, same as main conference requirements)
Submission Number: 2
Loading