Abstract: Accurate camera calibration is a fundamental task for 3D
perception, especially when dealing with real-world, in-thewild environments where complex optical distortions are
common. Existing methods often rely on pre-rectified images or calibration patterns, which limits their applicability and flexibility. In this work, we introduce a novel framework that addresses these challenges by jointly modeling
camera intrinsic and extrinsic parameters using a generic
ray camera model. Unlike previous approaches, AlignDiff
shifts focus from semantic to geometric features, enabling
more accurate modeling of local distortions. We propose
AlignDiff, a diffusion model conditioned on geometric priors, enabling the simultaneous estimation of camera distortions and scene geometry. To enhance distortion prediction,
we incorporate edge-aware attention, focusing the model
on geometric features around image edges, rather than semantic content. Furthermore, to enhance generalizability
to real-world captures, we incorporate a large database of
ray-traced lenses containing over three thousand samples.
This database characterizes the distortion inherent in a diverse variety of lens forms. Our experiments demonstrate
that the proposed method significantly reduces the angular error of estimated ray bundles by ∼ 8.2
◦ and overall
calibration accuracy, outperforming existing approaches on
challenging, real-world datasets.
Loading