
# Policy

This directory contains the source code for policy training and evaluation for PBHC. We also provide a pretrained policy for the `Horse-stance pose` motion ( at `PBHC/code/policy/logs/MotionTracking/pretrained_horse_stance_pose/model_50000.pt`).

Our code is based on the [ASAP [1]](https://github.com/LeCAR-Lab/ASAP) official codebase.

[1] ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills 


The following sections provide instructions on how to set up the environment, train the policy, and evaluate the trained policy.




### Hardware Requirements

We test the code in the following environment:
- **OS**: Ubuntu 20.04
- **GPU**: NVIDIA RTX 4090, Driver Version: 560.35.03 
- **CPU**: 13th Gen Intel(R) Core(TM) i7-13700



### Environment Setup
```bash
# Assuming pwd: PBHC/code/policy
conda create -n pbhc python=3.8
conda activate pbhc

# Install and Test IsaacGym
wget https://developer.nvidia.com/isaac-gym-preview-4
tar -xvzf isaac-gym-preview-4
pip install -e isaacgym/python
cd isaacgym/python/examples
python 1080_balls_of_solitude.py
python joint_monkey.py
cd ../../..

# Install PBHC
pip install -e .
pip install -e isaac_utils
```



### Usage

Policy Training:
- Change the `robot.motion.motion_file` in the command below to the motion you want to train. Our model dataset are provided in `data/`.
- Change the `num_envs`. We set it to `4096` for training, but you can set it to `128` for debugging.
- The output policy ckpt will be saved in `logs/MotionTracking/` with the name format `YYYYMMDD_HHMMSS-debug-motion_tracking-g1_23dof_lock_wrist` (e.g. `20990521_180647-debug-motion_tracking-g1_23dof_lock_wrist`).
- We train for `50000` iterations for each experiment in the paper.


```bash
python humanoidverse/train_agent.py \
+simulator=isaacgym +exp=motion_tracking +terrain=terrain_locomotion_plane \
project_name=MotionTracking num_envs=128 \
+obs=motion_tracking/tr_hist_priv_up \
+robot=g1/g1_23dof_lock_wrist \
+domain_rand=main \
+rewards=motion_tracking/main \
experiment_name=debug \
robot.motion.motion_file="PaperMotions/Horse-stance_pose.pkl" \
seed=1 \
+device=cuda:0
```



Policy Evaluation:
- The following commands provide examples of how to evaluate the trained policy.
- `eval_agent.py`: run the policy with visualization in IsaacGym.
- `sample_eps.py`: run the policy, output the evaluation metrics (accuracy and smoothness). The early termination mechanism is disabled, so the agent will run from the beginning to the end of the motion. 
  - Change the `num_envs` and `num_episodes` to the number of episodes you want to evaluate, these two should be the same.
- `ratio_eps.py`: run the policy, output the mean episode length and the ratio of the mean episode length to the reference motion length. The early termination mechanism is enabled, so the agent will stop when the motion is finished for computing the episode length. 
  - Same usage as the above.

```bash

python humanoidverse/eval_agent.py +device=cuda:0 +env.config.enforce_randomize_motion_start_eval=False +robot.motion.motion_lib_type=Better +env.config.termination.terminate_when_dof_far=False +checkpoint=logs/MotionTracking/pretrained_horse_stance_pose/model_50000.pt

python humanoidverse/sample_eps.py +device=cuda:0  +checkpoint=logs/MotionTracking/pretrained_horse_stance_pose/model_50000.pt +num_envs=1 +num_episodes=1 +eps_eval_name=samtraj +opt=record

python humanoidverse/ratio_eps.py +device=cuda:0 +checkpoint=logs/MotionTracking/pretrained_horse_stance_pose/model_50000.pt +opt=record +num_envs=1000 +num_episodes=1000 +eps_eval_name=ratio
```

