# SAC for Dexterous Control

This repository contains the code that powers our anonymous submission on locally linear embeddings (LLE) for Soft Actor-Critic (SAC) agents in **DexGym** and **RoboSuite**. The implementation builds heavily on the open-source [CleanRL](https://github.com/vwxyzjn/cleanrl) codebase and keeps its coding style (Tyro-based argument parsing, TensorBoard logging, and Weights & Biases hooks) for reproducibility.

## Repository Layout
- `sac_continuous_action_robosuite*.py`: SAC variants for RoboSuite manipulation tasks (`sac`, `sac_lle`, `sac_joint_lle`, `sac_dbc`, `sac_spr`, and SSL-augmented baselines).
- `sac_continuous_action_dexgym*.py`: Matching variants for DexGym tasks.
- `plot.py`, `plot_attn_maps.py`: Utilities for aggregating TensorBoard curves and visualizing attention/LLE statistics.
- `sbatch_file_continuous_{dexgym,robosuite}.sh`, `launch_robosuite_experiments.sh`: Job-launch helpers for large sweeps.
- `cleanrl/`, `cleanrl_utils/`, `docs/`: Vendored snapshots from CleanRL for reference; only the SAC implementations remain active in this repo.
- `requirements*/`: Curated requirement files for different clusters/setups.

## Environment Setup
1. **System dependencies**
   - Ubuntu 20.04/22.04 (or similar) with CUDA-capable GPUs is recommended.
   - [MuJoCo 2.3+](https://mujoco.readthedocs.io/) plus `GL`, `Xvfb`, and `patchelf` packages for headless rendering.
   - Latest NVIDIA drivers; export `MUJOCO_GL=osmesa` (already handled inside the batch scripts for headless jobs).

2. **Python environment**
   ```bash
   python3 -m venv .venv
   source .venv/bin/activate
   pip install --upgrade pip
   pip install -r requirements.txt
   ```
   - For lighter installs you can choose one of the `requirements/requirements-*.txt` files (e.g., `requirements-mujoco.txt`).

3. **External simulators**
   - **RoboSuite**: `pip install git+https://github.com/ARISE-Initiative/robosuite` (and install MuJoCo as described in their README).
   - **DexGym (dexterous_gym)**: install from its upstream repository (e.g., `pip install git+https://github.com/intuitive-robotics-lab/dexterous-gym`) and follow their asset download instructions.
   - Validate that `import robosuite` and `import dexterous_gym` succeed before launching training.

## Running Experiments
Every training script exposes a Tyro-powered CLI. Running with `--help` prints all flags, including LLE-specific hyperparameters.

### RoboSuite
```bash
python sac_continuous_action_robosuite_lle.py \
  --env-id Lift-Panda \
  --total-timesteps 500000 \
  --seed 1 \
  --exp-name lift_lle_seed1 \
  --lle_batch_size 2048 \
  --track --wandb-project-name sac-robosuite
```
- Replace `sac_continuous_action_robosuite_lle.py` with `sac_continuous_action_robosuite.py`, `..._joint.py`, `..._spr.py`, or `..._dbc.py` to run different ablations.
- Outputs land in `runs-robosuite/<exp_name>/` and include `args.txt`, TensorBoard events, CSV files tracking LLE losses, and checkpointed `*.pth` models.

### DexGym
```bash
python sac_continuous_action_dexgym_lle.py \
  --env-id EggHandOver-v0 \
  --total-timesteps 5000000 \
  --seed 1 \
  --exp-name egg_lle_seed1 \
  --local_window_size 40 \
  --lle_batch_size 2048 \
  --track --wandb-project-name sac-dexgym
```
- Available variants mirror the RoboSuite ones (`sac`, `sac_lle`, `sac_joint_lle`, `sac_recon-*`, `sac_spr`, `sac_dbc`).
- Logs are stored in `runs-dexgym/` with the same structure as RoboSuite.

### Logging Notes
- Pass `--track` (default `False`) to enable Weights & Biases logging. Set `WANDB_API_KEY` (or use `WANDB_MODE=offline`) before launching.
- TensorBoard files are sufficient to reproduce the plots. All scripts write `args.txt` so that each run is self-describing.

## Batch Jobs
- `sbatch_file_continuous_dexgym.sh` and `sbatch_file_continuous_robosuite.sh` contain the exact commands used for the large sweeps. Update the placeholder lines (`#SBATCH --account=<your_allocation>` and `#SBATCH --mail-user=<your_email@example.com>`) before submitting.
- Both scripts expect a virtual environment at `~/lle-rl/bin/activate`, configure MuJoCo paths, and forward algorithm/environment arguments. Example:
  ```bash
  sbatch sbatch_file_continuous_dexgym.sh sac_lle dexgym EggHandOver-v0 40 2048 1e-2 1e-3 1e-10 1e-5 1e-3 v1 1 "--track"
  ```
- `launch_robosuite_experiments.sh` shows how multiple RoboSuite jobs are batched into a single flexible SLURM submission.

## Plotting & Diagnostics
- **Learning Curves**: `plot.py` reads TensorBoard logs and produces publication-ready PDFs.
  ```bash
  python plot.py --suite dexgym --games PenCatchOverarm-v0 EggCatchOverarm-v0 --algorithms sac sac_lle sac_joint_lle --smooth_radius 500
  python plot.py --suite robosuite --games Lift-Panda NutAssemblySingle-Panda --log_dir runs-robosuite/ --output_dir plots --filename robosuite.pdf
  ```
- **Attention & LLE Visuals**: `plot_attn_maps.py` inspects `runs-dexgym/<exp>/` directories, loads saved checkpoints, and saves per-step attention maps for debugging representation learning.

## Reproducibility Checklist
- Set `--seed` for every run. Scripts seed the environment, PyTorch, and replay buffers consistently.
- Keep `args.txt`, CSV files, and checkpointed models found in `runs-{suite}/` for auditing.
- The SLURM scripts fix MuJoCo, CUDA, and WandB settings to eliminate cluster-to-cluster variance; adapt the module loads to your cluster if needed.

## Credits
- **CleanRL** (MIT License): base SAC implementations, Tyro interface, and logging utilities. Source snapshot included under `cleanrl/` and `cleanrl_utils/`.
- **RoboSuite**: Panda and Sawyer manipulation environments from the [ARISE Initiative](https://github.com/ARISE-Initiative/robosuite).
- **Dexterous Gym (`dexterous_gym`)**: Dexterous manipulation benchmark tasks from their public repository.
- **Stable-Baselines3**: Replay buffer implementation used throughout the SAC variants.
