Keywords: Human Sensing, Multimodal Learning
Abstract: Multimodal human sensing systems promise unprecedented accuracy and robustness but are often hindered by a critical real-world challenge: missing modalities, which can occur due to hardware failures, environmental interference (e.g., adverse weather for LiDAR), deployment cost constraints, or communication dropouts. The failure of one or more sensors can severely degrade performance and is exacerbated by two fundamental and intertwined issues: the Representation Gap between heterogeneous sensor data and the Contamination Effect from low-quality modalities. In this paper, we propose Midas, a novel framework that tackles both root challenges simultaneously through a synergistic integration of meta-learning and knowledge diffusion. To mitigate the Contamination Effect, Midas employs a meta-learning-driven weighting mechanism that dynamically learns to down-weight the influence of noisy, low-contributing modalities. To bridge the Representation Gap, it introduces a diffusion-based knowledge distillation paradigm where an information-rich teacher, formed from all available modalities, refines the features of each student modality. Comprehensive experiments on the large-scale MM-Fi and XRF55 datasets demonstrate that Midas achieves state-of-the-art performance, significantly improving robustness in numerous missing-modality scenarios. Our work provides a unified and effective solution for building robust, real-world multimodal human sensing systems.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 6420
Loading