Abstract: MASt3R redefines image matching as a 3D task but suffers from high inference latency (198ms per image pair on an A40 GPU). We introduce Speedy MASt3R, a post-training optimization framework that achieves a 54% speedup (91ms per pair) without compromising accuracy. Our approach incorporates four key techniques: (1) FlashMatch, which leverages FlashAttention v2 for efficient attention computation; (2) GraphFusion, which optimizes the computation graph using TensorRT; (3) FastNN-Lite, which reduces complexity from quadratic to linear; and (4) HybridCast, which enables mixed-precision inference. Evaluations on five benchmarks (Aachen Day-Night, InLoc, 7-Scenes, ScanNet1500, MegaDepth1500) demonstrate consistent performance, highlighting real-time 3D understanding capabilities.
External IDs:dblp:conf/coins/LiLFYPC25
Loading