Abstract: Multimodal learning emerges as a promising solution for high-precision localization, a cornerstone of 6G integrated sensing and communications (ISAC), by integrating measurements from different data sources. Yet its real-world deployment remains challenging because (i) the quality and relevance of different modalities fluctuate with frequency, noise, and antenna heterogeneity and (ii) spatial and fingerprint ambiguities under non-line-of-sight (NLOS) propagation obscure the mapping between channel measurements and positions. To overcome these challenges, we propose a spatial-context-aware dynamicfusion architecture built on the mixture-of-experts (SCADF-MoE) backbone. We first construct a million-scale comprehensive ray-tracing dataset measuring synchronized angle, distance, gain, and channel across diverse carrier frequencies, antenna geometries, and noise levels. A three-stage pre-processing pipeline then clusters neighboring points into short trajectories, enriching data samples with spatial context information. The resulting sequences are fed into SCADF-MoE: first, multimodal soft MoE blocks with learnable routing matrices dynamically fuse heterogeneous inputs according to their modality relevance in different environmental contexts; second, a modality-task MoE formulates position estimation as a multi-objective problem, simultaneously predicting coordinates of neighboring points to leverage their shared spatial correlations. Additionally, we introduce a regularization loss that enforces expert diversity and mitigates gradient conflicts during multi-task optimization. Simulations across three environments (dense-urban, suburban, canyon) and three heterogeneity dimensions (frequency, noise, antenna) demonstrate that SCADF-MoE achieves consistent sub-meter accuracy in all conditions, reducing overall MSE by 63%, and cuts unseen-NLOS error by 55% compared to state-of-the-art methods. To the best of our knowledge, this is the first work that leverages large-scale multimodal MoEs for high-precision ISAC localization.
External IDs:doi:10.1109/jsac.2025.3647414
Loading