Reward Shaping for Safe Spacecraft Proximity Operations: A Comparative Study
Keywords: reward shaping, safe reinforcement learning, spacecraft docking, constrained optimization, Lagrangian methods, curriculum learning, safety-critical systems, proximity operations
TL;DR: A controlled comparison of five reward formulations for spacecraft docking reveals that curriculum-based shaping achieves the best safety-performance balance (80.5% success, 2.2% collision) while Lagrangian constraints minimize violations.
Abstract: Reinforcement learning (RL) holds promise for autonomous spacecraft proximity operations, yet deploying learned policies in safety-critical orbital environments demands rigorous attention to constraint satisfaction. Reward function design fundamentally shapes agent behavior, but the effect of different reward formulations on safety outcomes for spacecraft docking remains underexplored. We present a controlled comparison of five reward formulations—sparse, dense, potential-based, Lagrangian-constrained, and curriculum-based—for a six-degree-of-freedom docking task governed by Clohessy-Wiltshire-Hill dynamics. Training all methods with Proximal Policy Optimization across five random seeds, we find that curriculum-based shaping achieves the best balance of performance and safety (80.5% success, 2.2% collision rate), while Lagrangian-constrained rewards minimize safety violations (0.8% keepout zone violations) at the cost of reduced task completion (66.8%). Sparse rewards fail to learn viable policies (33.8% success). These results reveal a fundamental safety-performance tradeoff and yield practical guidelines for reward design under mission-specific safety requirements.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 55
Loading