Spatial Affordance Prediction for Egocentric Task-Driven Navigation

Spatial Affordance Prediction for Egocentric Task-Driven Navigation

RSS 2025 Workshop EgoAct Submission4 Authors

01 May 2025 (modified: 10 Jun 2025)RSS 2025 Workshop EgoAct SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: egocentric perception, robot navigation, spatial affordance

TL;DR: We learn task-conditioned spatial affordance priors from egocentric video demonstrations, and apply them to egocentric robot navigation.

Abstract: We investigate the problem of spatial affordance prediction for egocentric task-driven navigation, that is, predicting locations in a environment where a given task is likely performed, using a single egocentric image and a natural language task query. Our end-to-end model encodes environment context and task semantics by fine-tuning a vision-language framework trained on egocentric human demonstrations from large-scale cooking activity videos. The resulting model outputs spatial regions representing task affordances relative to the egocentric camera pose. The resulting predictions outperform a nearest-neighbor baseline based on pretrained vision-language similarity, particularly on novel tasks and viewpoints. We incorporate these spatial affordance predictions for two robotic navigation applications: one, localizing goals for task completion, and two, defining task-based obstacles to avoid disturbing humans in a shared environment.

Submission Number: 4

Loading