HawkI: Homography & Mutual Information Guidance for 3D-free Single Image to Aerial View

TMLR Paper3608 Authors

31 Oct 2024 (modified: 06 Nov 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We present HawkI, for synthesizing aerial-view images from text and an exemplar image, without any additional multi-view or 3D information for finetuning or at inference. HawkI uses techniques from classical computer vision and information theory. It seamlessly blends the visual features from the input image within a pretrained text-to-2Dimage stable diffusion model with a test-time optimization process for a careful bias-variance trade-off, which uses an Inverse Perspective Mapping (IPM) homography transformation to provide subtle cues for aerialview synthesis. At inference, HawkI employs a unique mutual information guidance formulation to steer the generated image towards faithfully replicating the semantic details of the input-image, while maintaining a realistic aerial perspective. Mutual information guidance maximizes the semantic consistency between the generated image and the input image, without enforcing pixel-level correspondence between vastly different viewpoints. Through extensive qualitative and quantitative comparisons against text + exemplar-image based methods and 3D/ multi-view based novel-view synthesis methods on proposed synthetic and real datasets, we demonstrate that our method achieves a significantly better bias-variance trade-off towards generating high fidelity aerial-view images. Code and data will be made publicly available.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Mathieu_Salzmann1
Submission Number: 3608
Loading