# When a Robot is More Capable than a Human: Learning from Constrained Demonstrators
This is the official PyTorch implementation of paper "When a Robot is More Capable than a Human: Learning from Constrained Demonstrators".

## Abstract
Learning from demonstrations enables experts to teach robots complex tasks using interfaces such as kinesthetic teaching, joystick control, and sim-to-real transfer. However, these interfaces often restrict the expert's ability to demonstrate optimal behavior due to usability challenges, setup constraints, and safety concerns. For example, a joystick can move a robotic arm only in a 2D plane, even though the robot operates in a higher-dimensional space. We observe that such limitations in the expert’s data collection can severely decrease the quality of demonstrations and consequently the performance of learned policies. This raises a key question: \emph{Can a robot learn a better policy than the one demonstrated by the expert using self-supervised learning techniques if the expert is constrained?} 
We propose using task progress as the reward signal, which is derivable from constrained demonstrations and generalizable to unseen states. This reward signal enables learning policies that outperform the expert when constraints are relaxed.
Through extensive experiments, including real-world validation on a WidowX robotic arm, we show that our approach outperforms baseline methods in both sample efficiency and final task performance.

![teaser figure](images/constrained_teaser.png)
*A human expert constrained by a low-DoF joystick interface yields suboptimal trajectories. Following rewards that measure task progress, a robot can execute motions outside the demonstrations.*

## Installation
1. This code base requires `Python 3.10`. All package requirements are in
`requirements.txt`. To install from scratch using Anaconda, use the following
commands.

```
conda create -n goal310 python=3.10
source activate goal310
pip install -r requirements.txt

cd d4rl
pip install -e .
cd ../rl-toolkit
pip install -e .
```

2. Setup [Weights and Biases](https://wandb.ai/site) by first logging in with `wandb login <YOUR_API_KEY>` and then create `config.yaml` with your W&B username and project name as:
```
{
  "proj_name": "your-project-name",
  "wb_entity": "your-entity-name",
}
```

3. Download expert demonstration datasets to `./expert_datasets`. We include the expert demonstration datasets on [Google Drive](https://drive.google.com/drive/folders/1GQ1wfpo3iXOdKoj0kbvY6mbXXTwPg27u?usp=sharing) and provide a script for downloading them.
```
python download_demos.py
```

## demo collection
1. goal_prox/envs/fetch/fetch_pickplace_dems.py: script to generate Fetch Pick demonstrations. `python goal_prox/envs/fetch/fetch_pickplace_dems.py --easy-obs --env-name holdout`

2. goal_prox/envs/fetch/fetch_push_dems.py: script to generate Fetch Push demonstrations. `python goal_prox/envs/fetch/fetch_push_dems.py --easy-obs`

3. demo_collection/maze2d_demo_gen.py: script to generate maze2d demonstrations. `python demo_collection/maze2d_demo_gen.py --env_name MBRLmaze2d-v0 --mz-box-constrained True`

4. widowx_expert/examples/save_traj_joystick.py: script to collect widowx joystick demonstrations. `python widowx_expert/examples/save_traj_joystick.py --env_name WidowXLiftCube-v2`

## Experiments Reproduction
![teaser figure](images/experiment_setting.png)
*We use various manipulation and navigation tasks with different kinds and degrees of constrained expert demonstration datasets.*
### Minigrid-LfCD

1. ConstrainedExpert Setting
- GRIP: `python goal_prox/main.py --prefix grip --alg dpf-est-drop --env-name MiniGrid-FourRooms-Long-Hypotenuse-v0 --num-env-steps 2e7 --traj-batch-size 32 --traj-load-path ./expert_datasets/minigrid_cardinal_ac.pt --traj-frac 1.0 --gw-diag-action-space True --interpolation-dmode exp --pre-num-epochs 2 --prox-lr 0.005`
- Proximity-Drop: `python goal_prox/main.py --prefix prox_drop --alg dpf-drop --env-name MiniGrid-FourRooms-Long-Hypotenuse-v0 --traj-batch-size 32 --traj-load-path ./expert_datasets/minigrid_cardinal_ac.pt --traj-frac 1.0 --gw-diag-action-space True`
- Proximity: `python goal_prox/main.py --prefix prox --alg dpf --env-name MiniGrid-FourRooms-Long-Hypotenuse-v0 --traj-batch-size 32 --traj-load-path ./expert_datasets/minigrid_cardinal_ac.pt --traj-frac 1.0 --gw-diag-action-space True`
- BC: `python goal_prox/main.py --prefix bc --env-name MiniGrid-FourRooms-Long-Hypotenuse-v0 --alg bc --traj-load-path ./expert_datasets/minigrid_cardinal_ac.pt --traj-frac 1.0 --bc-num-epochs 1000 --max-grad-norm -1 --traj-val-ratio 0 --traj-batch-size 32 --gw-diag-action-space True`
- GAIL: `python goal_prox/main.py --prefix gail --env-name MiniGrid-FourRooms-Long-Hypotenuse-v0 --alg gail --disc-lr 0.0001 --traj-load-path ./expert_datasets/minigrid_cardinal_ac.pt --traj-frac 1.0 --action-input True --gail-reward-norm True --gw-diag-action-space True`
- GAIfO: `python goal_prox/main.py --prefix gaifo --env-name MiniGrid-FourRooms-Long-Hypotenuse-v0 --alg gaifo --disc-lr 0.0001 --traj-load-path ./expert_datasets/minigrid_cardinal_ac.pt --traj-frac 1.0 --gail-reward-norm True --gw-diag-action-space True`


### Maze2d

1. UnconstrainedExpert Setting
- GRIP: `python goal_prox/main.py --prefix grip --alg dpf-est-drop-deep --traj-batch-size 32 --traj-load-path ./expert_datasets/maze2d_full_ac.pt --traj-frac 1.0 --save-interval 1000 --env-name MBRLmaze2d-v0 --num-env-steps 1e7 --num-render 0 --eval-interval 5 --interpolation-dmode exp --pre-num-epochs 5 --prox-lr 0.001`
- Proximity-Drop: `python goal_prox/main.py --prefix prox-drop --alg dpf-drop-deep --traj-batch-size 32 --traj-load-path ./expert_datasets/maze2d_full_ac.pt --traj-frac 1.0 --save-interval 1000 --env-name MBRLmaze2d-v0 --num-env-steps 1e7`
- Proximity: `python goal_prox/main.py --prefix prox --alg dpf-deep --traj-batch-size 32 --traj-load-path ./expert_datasets/maze2d_full_ac.pt --traj-frac 1.0 --save-interval 1000 --env-name MBRLmaze2d-v0 --num-env-steps 1e7`
- BC: `python goal_prox/main.py --prefix bc --alg bc --traj-load-path ./expert_datasets/maze2d_full_ac.pt --traj-frac 1.0 --bc-num-epochs 10000 --eval-num-processes 32 --max-grad-norm -1 --traj-batch-size 32 --env-name MBRLmaze2d-v0`
- GAIL: `python goal_prox/main.py --prefix gail --env-name MBRLmaze2d-v0 --alg gail-deep --disc-lr 0.0001 --traj-load-path ./expert_datasets/maze2d_full_ac.pt --traj-frac 1.0 --action-input True --gail-reward-norm True --num-env-steps 1e7`
- GAIfO: `python goal_prox/main.py --prefix gaifo --env-name MBRLmaze2d-v0 --alg gaifo-deep --disc-lr 0.0001 --traj-load-path ./expert_datasets/maze2d_full_ac.pt --traj-frac 1.0 --gail-reward-norm True --num-env-steps 1e7`

2. ConstrainedExpert Setting
- GRIP: `python goal_prox/main.py --prefix grip --alg dpf-est-drop-deep --traj-batch-size 32 --traj-load-path ./expert_datasets/maze2d_box_01.pt --traj-frac 1.0 --save-interval 1000 --env-name MBRLmaze2d-v0 --num-env-steps 1e7 --num-render 0 --eval-interval 5 --interpolation-dmode exp --pre-num-epochs 5 --prox-lr 0.001`
- Proximity-Drop: `python goal_prox/main.py --prefix prox-drop --alg dpf-drop-deep --traj-batch-size 32 --traj-load-path ./expert_datasets/maze2d_box_01.pt --traj-frac 1.0 --save-interval 1000 --env-name MBRLmaze2d-v0 --num-env-steps 1e7`
- Proximity: `python goal_prox/main.py --prefix prox --alg dpf-deep --traj-batch-size 32 --traj-load-path ./expert_datasets/maze2d_box_01.pt --traj-frac 1.0 --save-interval 1000 --env-name MBRLmaze2d-v0 --num-env-steps 1e7`
- BC: `python goal_prox/main.py --prefix bc --alg bc --traj-load-path ./expert_datasets/maze2d_box_01.pt --traj-frac 1.0 --bc-num-epochs 10000 --eval-num-processes 32 --max-grad-norm -1 --traj-batch-size 32 --env-name MBRLmaze2d-v0`
- GAIL: `python goal_prox/main.py --prefix gail --env-name MBRLmaze2d-v0 --alg gail-deep --disc-lr 0.0001 --traj-load-path ./expert_datasets/maze2d_box_01.pt --traj-frac 1.0 --action-input True --gail-reward-norm True --num-env-steps 1e7`
- GAIfO: `python goal_prox/main.py --prefix gaifo --env-name MBRLmaze2d-v0 --alg gaifo-deep --disc-lr 0.0001 --traj-load-path ./expert_datasets/maze2d_box_01.pt --traj-frac 1.0 --gail-reward-norm True --num-env-steps 1e7`

### Fetch-Pick

1. UnconstrainedExpert Setting
- GRIP: `python goal_prox_il/goal_prox/main.py --use-proper-time-limits --prefix grip --alg dpf-est-drop-deep --num-mini-batch 32 --num-epochs 10 --exp-sample-size 4096 --exp-buff-size 4096 --pf-uncert-scale 0.001 --pf-delta 0.99 --num-env-steps 1e8 --traj-load-path ./expert_datasets/fetchpick_box_01.pt --lr 1e-4 --env-name FetchPickAndPlaceDiffHoldoutTS150-v0 --box-ub 0.1 --num-render 0 --num-processes 64 --interpolation-dmode exp --pre-num-epochs 2 --entropy-coef 0.001 --prox-lr 0.001`
- Proximity-Drop: `python goal_prox/main.py --use-proper-time-limits --prefix prox-drop --alg dpf-drop-deep --num-mini-batch 32 --num-epochs 10 --exp-sample-size 4096 --exp-buff-size 4096 --pf-uncert-scale 0.001 --pf-delta=0.99 --entropy-coef 0.0001 --num-env-steps 1e8 --traj-load-path ./expert_datasets/fetchpick_box_01.pt --lr 1e-4 --env-name FetchPickAndPlaceDiffHoldoutTS150-v0 --box-ub 0.1 --num-processes 64`
- Proximity: `python goal_prox/main.py --use-proper-time-limits --prefix prox --alg dpf-deep --num-mini-batch 32 --num-epochs 10 --exp-sample-size 4096 --exp-buff-size 4096 --pf-uncert-scale 0.001 --pf-delta=0.99 --entropy-coef 0.0001 --num-env-steps 1e8 --traj-load-path ./expert_datasets/fetchpick_box_01.pt --lr 1e-4 --env-name FetchPickAndPlaceDiffHoldoutTS150-v0 --box-ub 0.1 --num-processes 64`
- BC: `python goal_prox/main.py --use-proper-time-limits --prefix bc --alg bc --traj-val-ratio 0 --num-env-steps 1e8 --traj-load-path ./expert_datasets/fetchpick_box_01.pt --normalize-env False --bc-num-epochs 1000 --env-name FetchPickAndPlaceDiffHoldoutTS150-v0 --box-ub 0.1 --num-processes 64`
- GAIL: `python goal_prox/main.py --use-proper-time-limits --prefix gail --alg gail-deep --num-mini-batch 32 --num-epochs 10 --lr 0.0001 --disc-lr 0.0001 --action-input True --entropy-coef 0.0001 --gail-reward-norm True --num-env-steps 1e8 --traj-load-path ./expert_datasets/fetchpick_box_01.pt --env-name FetchPickAndPlaceDiffHoldoutTS150-v0 --box-ub 0.1 --num-processes 64`
- GAIfO: `python goal_prox/main.py --use-proper-time-limits --prefix gaifo --alg gaifo-deep --num-mini-batch 32 --num-epochs 10 --lr 0.0001 --disc-lr 0.0001 --action-input False --entropy-coef 0.0001 --gail-reward-norm True --num-env-steps 1e8 --traj-load-path ./expert_datasets/fetchpick_box_01.pt --env-name FetchPickAndPlaceDiffHoldoutTS150-v0 --box-ub 0.1 --num-processes 64`

2. ConstrainedExpert Setting
- GRIP: `python goal_prox/main.py --use-proper-time-limits --prefix grip --alg dpf-est-drop-deep --num-mini-batch 32 --num-epochs 10 --exp-sample-size 4096 --exp-buff-size 4096 --pf-uncert-scale 0.001 --pf-delta 0.99 --num-env-steps 1e8 --traj-load-path ./expert_datasets/fetchpick_box_01.pt --lr 1e-4 --env-name FetchPickAndPlaceDiffHoldoutTS150-v0 --num-render 0 --num-processes 64 --interpolation-dmode exp --pre-num-epochs 2 --entropy-coef 0.001 --prox-lr 0.001`
- Proximity-Drop: `python goal_prox/main.py --use-proper-time-limits --prefix prox-drop --alg dpf-drop-deep --num-mini-batch 32 --num-epochs 10 --exp-sample-size 4096 --exp-buff-size 4096 --pf-uncert-scale 0.001 --pf-delta=0.99 --entropy-coef 0.0001 --num-env-steps 1e8 --traj-load-path ./expert_datasets/fetchpick_box_01.pt --lr 1e-4 --env-name FetchPickAndPlaceDiffHoldoutTS150-v0 --add-cdf-regularizer False --num-processes 64`
- Proximity: `python goal_prox/main.py --use-proper-time-limits --prefix prox --alg dpf-deep --num-mini-batch 32 --num-epochs 10 --exp-sample-size 4096 --exp-buff-size 4096 --pf-uncert-scale 0.001 --pf-delta=0.99 --entropy-coef 0.0001 --num-env-steps 1e8 --traj-load-path ./expert_datasets/fetchpick_box_01.pt --lr 1e-4 --env-name FetchPickAndPlaceDiffHoldoutTS150-v0 --add-cdf-regularizer False --num-processes 64`
- BC: `python goal_prox/main.py --use-proper-time-limits --prefix bc --alg bc --traj-val-ratio 0 --num-env-steps 1e8 --traj-load-path ./expert_datasets/fetchpick_box_01.pt --normalize-env False --bc-num-epochs 1000 --env-name FetchPickAndPlaceDiffHoldoutTS150-v0 --num-processes 64`
- GAIL: `python goal_prox/main.py --use-proper-time-limits --prefix gail --alg gail-deep --num-mini-batch 32 --num-epochs 10 --lr 0.0001 --disc-lr 0.0001 --action-input True --entropy-coef 0.0001 --gail-reward-norm True --num-env-steps 1e8 --traj-load-path ./expert_datasets/fetchpick_box_01.pt --env-name FetchPickAndPlaceDiffHoldoutTS150-v0 --num-processes 64`
- GAIfO: `python goal_prox/main.py --use-proper-time-limits --prefix gaifo --alg gaifo-deep --num-mini-batch 32 --num-epochs 10 --lr 0.0001 --disc-lr 0.0001 --action-input False --entropy-coef 0.0001 --gail-reward-norm True --num-env-steps 1e8 --traj-load-path ./expert_datasets/fetchpick_box_01.pt --env-name FetchPickAndPlaceDiffHoldoutTS150-v0 --num-processes 64`


### Fetch-Push

1. UnconstrainedExpert Setting
- GRIP: `python goal_prox/main.py --use-proper-time-limits --prefix grip --alg dpf-est-drop-deep --num-mini-batch 32 --num-epochs 10 --exp-sample-size 4096 --exp-buff-size 4096 --pf-uncert-scale 0.001 --pf-delta=0.99 --entropy-coef 0.001 --num-env-steps 1e8 --traj-load-path /expert_datasets/fetchpush_box_005.pt --lr 1e-4 --env-name FetchPushEnvCustomTS500-v0 --box-ub 0.05 --num-processes 64 --pre-num-epochs 5 --prox-lr 0.0001`
- Proximity-Drop: `python goal_prox/main.py --use-proper-time-limits --prefix prox-drop --alg dpf-drop-deep --num-mini-batch 32 --num-epochs 10 --exp-sample-size 4096 --exp-buff-size 4096 --pf-uncert-scale 0.001 --pf-delta=0.99 --entropy-coef 0.001 --num-env-steps 1e8 --traj-load-path ./expert_datasets/fetchpush_box_005.pt --lr 1e-4 --env-name FetchPushEnvCustomTS500-v0 --box-ub 0.05`
- Proximity: `python goal_prox/main.py --use-proper-time-limits --prefix prox --alg dpf-deep --num-mini-batch 32 --num-epochs 10 --exp-sample-size 4096 --exp-buff-size 4096 --pf-uncert-scale 0.001 --pf-delta=0.99 --entropy-coef 0.001 --num-env-steps 1e8 --traj-load-path ./expert_datasets/fetchpush_box_005.pt --lr 1e-4 --env-name FetchPushEnvCustomTS500-v0 --box-ub 0.05`
- BC: `python goal_prox/main.py --use-proper-time-limits --prefix bc --alg bc --eval-num-processes 20 --traj-val-ratio 0 --traj-load-path ./expert_datasets/fetchpush_box_005.pt --normalize-env False --bc-num-epochs 1000 --env-name FetchPushEnvCustomTS500-v0 --box-ub 0.05`
- GAIL: `python goal_prox/main.py --use-proper-time-limits --prefix gail --alg gail-deep --num-mini-batch 32 --num-epochs 10 --disc-lr 0.0001 --entropy-coef 0.001 --action-input True --gail-reward-norm True --num-env-steps 1e8 --traj-load-path ./expert_datasets/fetchpush_box_005.pt --lr 1e-4 --env-name FetchPushEnvCustomTS500-v0 --box-ub 0.05`
- GAIfO: `python goal_prox/main.py --use-proper-time-limits --prefix gaifo --alg gaifo-deep --num-mini-batch 32 --num-epochs 10 --disc-lr 0.0001 --entropy-coef 0.001 --action-input False --gail-reward-norm True --num-env-steps 1e8 --traj-load-path ./expert_datasets/fetchpush_box_005.pt --lr 1e-4 --env-name FetchPushEnvCustomTS500-v0 --box-ub 0.05`

2. ConstrainedExpert Setting
- GRIP: `python goal_prox/main.py --use-proper-time-limits --prefix grip --alg dpf-est-drop-deep --num-mini-batch 32 --num-epochs 10 --exp-sample-size 4096 --exp-buff-size 4096 --pf-uncert-scale 0.001 --pf-delta=0.99 --entropy-coef 0.001 --num-env-steps 1e8 --traj-load-path ./expert_datasets/fetchpush_box_005.pt --lr 1e-4 --env-name FetchPushEnvCustomTS500-v0 --num-processes 64 --pre-num-epochs 5 --prox-lr 0.0001`
- Proximity-Drop: `python goal_prox/main.py --use-proper-time-limits --prefix prox-drop --alg dpf-drop-deep --num-mini-batch 32 --num-epochs 10 --exp-sample-size 4096 --exp-buff-size 4096 --pf-uncert-scale 0.001 --pf-delta=0.99 --entropy-coef 0.001 --num-env-steps 1e8 --traj-load-path ./expert_datasets/fetchpush_box_005.pt --lr 1e-4 --env-name FetchPushEnvCustomTS500-v0`
- Proximity: `python goal_prox/main.py --use-proper-time-limits --prefix prox --alg dpf-deep --num-mini-batch 32 --num-epochs 10 --exp-sample-size 4096 --exp-buff-size 4096 --pf-uncert-scale 0.001 --pf-delta=0.99 --entropy-coef 0.001 --num-env-steps 1e8 --traj-load-path ./expert_datasets/fetchpush_box_005.pt --lr 1e-4 --env-name FetchPushEnvCustomTS500-v0`
- BC: `python goal_prox/main.py --use-proper-time-limits --prefix bc --alg bc --traj-val-ratio 0 --traj-load-path ./expert_datasets/fetchpush_box_005.pt --normalize-env False --bc-num-epochs 1000 --env-name FetchPushEnvCustomTS500-v0`
- GAIL: `python goal_prox/main.py --use-proper-time-limits --prefix gail --alg gail-deep --num-mini-batch 32 --num-epochs 10 --disc-lr 0.0001 --entropy-coef 0.001 --action-input True --gail-reward-norm True --num-env-steps 1e8 --traj-load-path ./expert_datasets/fetchpush_box_005.pt --lr 1e-4 --env-name FetchPushEnvCustomTS500-v0`
- GAIfO: `python goal_prox/main.py --use-proper-time-limits --prefix gaifo --alg gaifo-deep --num-mini-batch 32 --num-epochs 10 --disc-lr 0.0001 --entropy-coef 0.001 --action-input False --gail-reward-norm True --num-env-steps 1e8 --traj-load-path ./expert_datasets/fetchpush_box_005.pt --lr 1e-4 --env-name FetchPushEnvCustomTS500-v0`

### WidowX-Pick

1. ConstrainedExpert Setting
- GRIP: `python goal_prox/main.py --use-proper-time-limits --prefix grip --alg dpf-est-drop-deep --num-mini-batch 32 --num-epochs 10 --exp-sample-size 4096 --exp-buff-size 4096 --pf-uncert-scale 0.001 --num-env-steps 1e8 --traj-load-path ./expert_datasets/widowx_v2.pt --env-name WidowXLiftCube-v2 --add-cdf-regularizer False --entropy-coef 1e-5`
- Proximity-Drop: `python goal_prox/main.py --use-proper-time-limits --prefix prox-drop --alg dpf-drop-deep --num-mini-batch 32 --num-epochs 10 --exp-sample-size 4096 --exp-buff-size 4096 --pf-uncert-scale 0.001 --num-env-steps 1e8 --traj-load-path ./expert_datasets/widowx_v2.pt --env-name WidowXLiftCube-v2 --add-cdf-regularizer False --entropy-coef 1e-5`
- Proximity: `python goal_prox/main.py --use-proper-time-limits --prefix prox --alg dpf-deep --num-mini-batch 32 --num-epochs 10 --exp-sample-size 4096 --exp-buff-size 4096 --pf-uncert-scale 0.001 --num-env-steps 1e8 --traj-load-path ./expert_datasets/widowx_v2.pt --env-name WidowXLiftCube-v2 --add-cdf-regularizer False --entropy-coef 1e-5`
- BC: `python goal_prox/main.py --use-proper-time-limits --prefix bc --alg bc --eval-num-processes 20 --traj-val-ratio 0 --traj-load-path ./expert_datasets/widowx_v2.pt --env-name WidowXLiftCube-v2 --normalize-env False --bc-num-epochs 1000 --num-render 0`
- GAIL: `python goal_prox/main.py --use-proper-time-limits --prefix gail --alg gail-deep --num-mini-batch 32 --num-epochs 10 --disc-lr 0.0001 --entropy-coef 1e-5 --action-input True --gail-reward-norm True --num-env-steps 1e8 --lr 1e-5 --traj-load-path ./expert_datasets/widowx_v2.pt --env-name WidowXLiftCube-v2`
- GAIfO: `python goal_prox/main.py --use-proper-time-limits --prefix gaifo --alg gaifo-deep --num-mini-batch 32 --num-epochs 10 --disc-lr 0.0001 --entropy-coef 1e-5 --action-input True --gail-reward-norm True --num-env-steps 1e8 --lr 1e-5 --traj-load-path ./expert_datasets/widowx_v2.pt --env-name WidowXLiftCube-v2`

## Quantitative Results
For all environments, we run experiments with four random seeds, and each evaluation checkpoint averages results over 180 episodes. We report the average episode length across all evaluation trajectories from the final trained policy, including unsuccessful attempts. This length metric measures the policy's optimality and ability to leverage the unconstrained action space for faster goal completion.

![teaser figure](images/quantitative_results.png)
*Results: UnconstrainedExpert settings (above) and ConstrainedExpert settings (below). LfCD-GRIP achieves strong performance in both UnconstrainedExpert and ConstrainedExpert settings. While it remains competitive in constrained regimes, it consistently outperforms other baselines when the agent's action space is expanded.*



