Abstract: We introduce ErLang Sight, a novel multimodal system designed for liquid identification, integrating vision-based object detection, millimeter-wave (mmWave) Synthetic Aperture Radar (SAR) imaging, and large language model (LLM)-based contextual reasoning. Initially, the system leverages a visual detection pipeline to identify potential liquid containers within the environment, subsequently directing a mmWave sensor to perform targeted SAR imaging of these identified regions, and the permittivity values of the liquids are estimated using reflection coefficient analysis techniques. These physical measurements, combined with visual context and environmental indicators (such as whether the scenario is a kitchen, laboratory, or bar), are then input into a pretrained LLM. The LLM employs advanced semantic and situational reasoning to accurately determine the most likely type of liquid by integrating physics-based data with contextual knowledge. Experimental evaluations demonstrate that ErLang Sight significantly enhances the accuracy of distinguishing visually ambiguous liquids and exhibits robust generalization to previously unseen environments.
External IDs:dblp:conf/mobisys/LiangPLGX25
Loading