Vision-Guided Deep Reinforcement Learning for Autonomous Spacecraft Rendezvous and Docking

Mysore supreeth

Vision-Guided Deep Reinforcement Learning for Autonomous Spacecraft Rendezvous and Docking

Mysore supreeth

Published: 26 Apr 2026, Last Modified: 26 Apr 2026AI4SpaceEveryoneRevisionsCC BY 4.0

Keywords: vision-based navigation, deep reinforcement learning, spacecraft docking, domain randomization, sim-to-real transfer, convolutional neural network, autonomous rendezvous, Grad-CAM

TL;DR: An end-to-end vision-RL framework learns spacecraft docking directly from 84x84 pixel images, achieving 70.8% success rate with domain randomization closing the gap to 87.2% state-based oracle.

Abstract: Autonomous spacecraft rendezvous and docking is a critical capability for satellite servicing, orbital debris removal, and space station logistics. While deep reinforcement learning (RL) has shown promise for spacecraft proximity operations, existing approaches predominantly rely on privileged state information—relative position, velocity, and attitude—that must be estimated from raw sensor data in practice. We present an end-to-end vision-guided deep RL framework that learns docking policies directly from monocular camera images, eliminating the need for explicit pose estimation. Our approach integrates a convolutional neural network encoder with Proximal Policy Optimization and Soft Actor-Critic algorithms, trained under Clohessy-Wiltshire-Hill relative orbital dynamics. To address the visual domain gap inherent in simulation-based training, we introduce a systematic domain randomization pipeline encompassing lighting variation, surface texture perturbation, sensor noise injection, and Earth albedo effects, which improves docking success rates by 25 percentage points. Through evaluation of five methods over 500K training timesteps with five random seeds, we show that our best vision-based policy (Vision-SAC) achieves a 70.8% docking success rate from raw 84x84 pixel observations, compared to 87.2% for a state-based oracle with access to the true relative state vector. Gradient-weighted Class Activation Mapping analysis reveals that learned policies attend to physically meaningful features including spacecraft edges and docking port geometry, offering interpretability for safety-critical autonomy.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 54

Loading