Bridging the Sim-to-real Gap in RF Localization with Large-Scale Synthetic Pretraining

Published: 02 Mar 2026, Last Modified: 28 Mar 2026ICLR 2026 Workshop DATA-FMEveryoneRevisionsCC BY 4.0
Keywords: RF localization, Sim-to-Real, Machine Learning, Deep learning
Abstract: Radio frequency (RF) fingerprinting is a promising localization technique for GPS-denied environments, yet it suffers from poor generalization to unmapped areas. Traditional k-nearest neighbor methods perform well where data exists but fail on unseen streets. Deep learning can learn generalizable spatial-RF patterns, but requires far more training data than typical measurement campaigns provide. We investigate whether synthetic data can bridge this gap. Using a real-world dataset from Rome and NVIDIA’s Sionna ray-tracing simulator, we generate synthetic datasets under varying fidelity and scale: Dataset B′uses real base station (BS) locations with Gaussian Process-calibrated signals (53K samples), while Dataset C uses fully simulated BSs and signals (274K samples). Our evaluation reveals a pronounced sim-to-real gap—models achieving 25m error on synthetic data degrade to 184m on real data—yet pretraining on synthetic data reduces real-world error from 323m to 162m, a 50% improvement. Notably, simulation fidelity proves more important than scale: the smaller calibrated dataset outperforms the larger uncalibrated one. We further evaluate cross-city generalization on an unseen Oslo dataset, achieving 132m zero-shot RMSE and 62m afterfine-tuning. This work provides a systematic study of synthetic-to-real transfer for RF localization, highlighting the value of simulation-aware pretraining.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 170
Loading