everyone
since 09 May 2025">EveryoneRevisionsBibTeXCC BY 4.0
Enabling robots to navigate efficiently in unknown environments is a key challenge in embodied intelligence. Human exploration relies on accumulated knowledge, spatio-temporal memory, and scene semantic understanding. Inspired by these principles, we propose HuLE-Nav, a zero-shot object navigation method with two core components: multi-dimensional semantic value maps that emulate human-like memory retention and active exploration mechanisms that mimic human behavior. Specifically, HuLE-Nav utilizes Vision-Language Models (VLMs) and real-time observations to dynamically capture semantic relationships between objects, scene semantics, and spatio-temporal exploration history. This information is then represented and iteratively updated in the multi-dimensional semantic value maps. Using these maps, HuLE-Nav employs active exploration mechanisms that integrate dynamic exploration, replanning, collision avoidance, and target verification, enabling flexible long-term goal selection and real-time adaptation of navigation strategies. Experimental results on the challenging HM3D and Gibson datasets show that HuLE-Nav outperforms the best existing competitors in terms of both success rate and exploration efficiency.