Keywords: Human-in-the-loop Methods, Failure Recovery, Robot-assisted Feeding, Uncertainty Estimation
Abstract: Robot-assisted bite acquisition requires accurately identifying food items and selecting the correct high-level action (e.g., skewering, scooping, or twirling) to acquire them. While foundation models such as Large Language Models (LLMs) and Vision Language Models (VLMs) enable modular and generalizable perception-to-action pipelines, their deployment can result in occasional failures—particularly when faced with diverse or ambiguous food items. To ensure safety and reliability in such settings, we propose selectively querying the care recipient for help when uncertainty arises. However, excessive or poorly timed queries can impose a workload on users, especially those with mobility limitations. We introduce a modular human-in-the-loop framework that queries across different components of the pipeline using querying rules informed by both model uncertainty and predicted user workload. We define three querying rules and evaluate them in offline simulations using realistic food plate images. Our results show that all querying rules improve task performance over the baseline, with each offering a different trade-off between task performance and query efficiency. Our framework generalizes beyond assistive feeding and provides a principled approach for safe, efficient querying in foundation model-driven robotics systems.
Submission Number: 18
Loading