ShadowFlow: Learning Ambient Shadow Motion as a Non-Visual State Modality for Embodied Language Interaction
Keywords: Device Free Localization, Ambient Light Sensing, Shadow Motion Representation, Multi View Fusion, Embodied State Perception
Abstract: Language grounded embodied agents require accurate and continuous human state localization in indoor environments, but camera based tracking is often unacceptable in privacy sensitive applications.
Existing device free approaches under unmodulated light lack a structured motion representation that can support sparse sensing and multi view sequence learning.
To address this gap, we present ShadowFlow, a non imaging framework that infers continuous 2D trajectories from ambient illumination using sparse photodiode (PD) arrays without active modulation or visual capture.
ShadowFlow lifts sparse PD readings into a differentiable grayscale shadow field on a virtual wall and derives a compact shadow flow tensor using lightweight optical flow operators.
Since shadow deformation is view dependent and spatially heterogeneous, ShadowFlow encodes each view with attention parallel encoders and performs recurrent fusion to aggregate complementary spatial cues for trajectory regression.
On 927 minutes of real world recordings from seven participants in two indoor layouts, ShadowFlow achieves centimeter level accuracy with a 2.35 cm mean localization error and supports real time inference on embedded hardware.
The results indicate that ambient shadow flow provides a practical non visual motion modality that supports cross modal grounding for embodied language interaction and robotic perception.
Paper Type: Long
Research Area: Low-resource Methods for NLP
Research Area Keywords: vision language navigation, cross modal pretraining, cross modal application, multimodal applications, multimodal grounding, cross modal information extraction
Contribution Types: Model analysis & interpretability, Data analysis, Theory
Languages Studied: English
Submission Number: 3375
Loading