Keywords: Domestic robots, Household robots, Human values, Pluralistic value
Abstract: Household robots are often evaluated by task completion, but everyday domestic environments involve decisions that are not fully represented by task success alone. A robot may face a dilemma where two possible actions prioritize different human values such as privacy, safety, efficiency, or social appropriateness. We introduce RobotValues, a benchmark for evaluating household robot planners in value-conflict scenarios. Each instance pairs a realistic household image with two plausible robot actions that prioritize different human values. We construct RobotValues through LLM-assisted scenario generation, stakeholder-grounded value extraction, image generation, and manual quality control. Using RobotValues, we evaluate vision-language models (VLMs) and find that models exhibit default value preferences, including lower default preferences for categories such as compliance and conformity. Although explicit value priorities steer the evaluated models' action choices, the models sometimes fail to override their default preferences when the requested value conflicts with those preferences. These findings suggest that household robot evaluation should move beyond task completion and should also measure how robots decide among feasible actions that prioritize diverse human values.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 132
Loading