SpaceGym: A Gymnasium-Based Benchmark for Deep Reinforcement Learning in Spacecraft Proximity Operations

Mysore supreeth; Manish Mehta

SpaceGym: A Gymnasium-Based Benchmark for Deep Reinforcement Learning in Spacecraft Proximity Operations

Mysore supreeth, Manish Mehta

Published: 26 Apr 2026, Last Modified: 26 Apr 2026AI4SpaceEveryoneRevisionsCC BY 4.0

Keywords: Reinforcement learning, spacecraft docking, Gymnasium benchmark, proximity operations, Clohessy-Wiltshire dynamics, deep RL, attitude control, space autonomy

TL;DR: We present SpaceGym, a suite of three Gymnasium-compatible environments for benchmarking deep RL in spacecraft docking, rendezvous, and attitude control, with a comprehensive evaluation of PPO, SAC, TD3, and DQN.

Abstract: We present SpaceGym, a suite of three Gymnasium-compatible environments for benchmarking deep reinforcement learning in spacecraft proximity operations. SpaceGym provides standardized, physically grounded simulation environments for three core tasks: docking, rendezvous, and attitude control. The docking and rendezvous environments implement Clohessy-Wiltshire-Hill relative motion equations for translational dynamics, while the attitude control environment implements Euler's rotational equations with quaternion kinematics. All environments expose the standard Gymnasium interface, enabling seamless integration with existing reinforcement learning libraries. We benchmark four deep reinforcement learning algorithms—Proximal Policy Optimization (PPO), Soft Actor-Critic (SAC), Twin Delayed DDPG (TD3), and Deep Q-Network (DQN)—across all three environments, training each configuration for 500K timesteps over five random seeds (60 total runs). Our results, evaluated entirely in simulation, show that SAC achieves the highest success rates across all tasks (82% docking, 62% rendezvous, 75% attitude control), while DQN consistently underperforms due to the limitations of discretizing continuous action spaces. Ablation studies demonstrate that dense reward shaping improves docking success rates from 35% to 82% over sparse alternatives, and that observation space design significantly influences learning efficiency. SpaceGym addresses the absence of standardized reinforcement learning benchmarks for spacecraft autonomy and provides a reproducible experimental foundation for future research in this domain.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 53

Loading