Towards Task Planning for Proactive Safety in Home Service Robots: A Scene Graph-Augmented LLM Approach

Sena Ishii; Ankit Ravankar; Yasuhisa Hirata

Towards Task Planning for Proactive Safety in Home Service Robots: A Scene Graph-Augmented LLM Approach

Sena Ishii, Ankit Ravankar, Yasuhisa Hirata

Published: 27 May 2026, Last Modified: 27 May 2026ICRA 2026 SRRA Workshop LightningTalkPosterEveryoneRevisionsCC BY 4.0

Keywords: Service robotics, Home safety, Vision-language models, scene graph, LLM task planning, Semantic reasoning

TL;DR: We use a scene graph-augmented VLM to select safe robot actions at home. Experiments show scene graphs are essential for grounding executable plans, though VLMs reason about risks well without them.

Abstract: Service robots deployed in homes must do more than detect potential hazards—they must decide what to do about them. Prior work has demonstrated that large language and vision-language models (LLMs and VLMs) can infer household accident risks from a single image, but it largely stops at recognition and does not ground this recognition in what the robot can physically do in its own environment. We propose a pipeline that utilizes a pre-built semantic map of the home, represents it as a compact hierarchical scene graph augmented with a statemap—a layer encoding abstract room-level states such as “children playing” or “cluttered”—and feeds it to an LLM together with an onboard RGB observation. The LLM selects one of four grounded actions (MOVE, PUSH, NO ACTION, ALERT ONLY) together with the target object and, when applicable, a destination landmark drawn from the scene graph. We evaluate the pipeline on 15 simulated scenarios in Isaac Sim, comparing conditions with and without the scene graph, and on two real-robot scenarios with a mobile manipulator. Our results show that risk reasoning is robust regardless of the scene graph, while the scene graph improves plan executability by grounding destinations in the robot’s actual environment. A case study on statemap-augmented planning further demonstrates that providing abstract environmental states can redirect the LLM’s destination selection when safety implications are direct—suggesting that scene-state awareness is a promising lever for safer autonomous task planning.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 39

Loading