Keywords: Space Robotics, Vision Language Model, Safety Bounding
TL;DR: VLM-based semantic-safety integration architectures for space robotics exploration/navigation
Abstract: Planetary exploration robots operate under constraints that challenge modern autonomy: communication latency limits learning from human intervention, mass and power budgets restrict sensing and compute, and training data is scarce. While geometric perception is commonly used for navigation, it often fails to capture semantically meaningful hazards that are not well defined by geometry alone. Satellite imagery can help manage growing uncertainty bounds by marking out regions known to be unsafe, but occluded environments do not have this support. Vision-language models (VLMs) offer a way to reason about such semantic uncertainty, but their unpredictable failure modes limit use in safety-critical systems. We propose that robust autonomy in space is better achieved through architectural integration of geometric and semantic perception, rather than relying solely on training better models. We introduce a framework in which a VLM acts as a conservative semantic safety advisor, augmenting a geometric planner with safety bounds such that uncertainty results in over-restriction rather than unsafe actions. We evaluate three integration strategies: single-pass zero-shot detection, multi-stage decomposed reasoning with temporal filtering, and proposal verification with iterative refinement. Preliminary results demonstrate that these architectures improve the safety of a geometric-only baseline in simulated navigation tasks.
Submission Number: 14
Loading