ID: 6078

ICLR 2026 Submission

This page offers visual examples of FastForward localizing query images across a variety of challenging scenarios, such as symmetric objects, opposing shots, or large-scale scenes. We visualize the camera trajectory of a user trying to localize in a location where mapping images and poses are available. For instance, the images and camera parameters are sourced from a tracking system used during the mapping scan recording (e.g., the Wayspots dataset). Besides, we also provide a comparison between FastForward and its closest competitors, MASt3R and Reloc3r. We display the histogram of errors for each localization scan, providing a clear measurement of their potential for a good user experience.

Visual Localization


We show examples of FastForward, doing mapping and localization of query images in a single forward-pass. Instead of using all the available mapping images, we represent the scene by selecting the top K images based on a retrieval step. For outdoor data we use 20 mapping images, while indoor examples use 10 mapping images. From each mapping image we sample 20% of the features. We show the predicted query pose in blue and the ground truth pose in green. We also show the camera trajectories for the mapping images in gray, and additionally display the camera frustum for the mapping images used in the prediction.

There are two videos per dataset. Use the left and right navigation buttons to switch between the two scenes.

Extreme Localization Scenarios


We evaluate FastForward against the baseline methods Reloc3r and MASt3R in extreme visual localization scenarios. For the Wayspots scenes, we only display Reloc3r estimates as MASt3R was trained on this dataset. In the Wayspots scenes, we highlight two challenging scenarios: opposing shots (i.e., mapping and query scans taken from opposite viewpoints) and symmetric scenes (i.e., scene with similar appearance from different viewpoints). We also present results on the Cambridge dataset to demonstrate how the methods generalize to scenes with scale ranges unseen during training.

There are two videos per dataset. Use the left and right navigation buttons to switch between the two scenes.