# Uncertainty-Based Smooth Policy Regularisation for Reinforcement Learning with Few Demonstrations

This repository is the official implementation of Smooth Policy Regularisation from Demonstrations (SPReD) based on TD3 algorithm.

## Environment
Our method and baselines are evaluated on eight robotics tasks with sparse rewards simulated by the [MuJoCo](https://mujoco.org) physics engine.
- FetchPush: Moving objects to target positions on a tabletop.
- FetchSlide: Striking objects toward targets beyond the arm’s reach.
- FetchPickAndPlace: Lifting and positioning objects in 3D space.
- FetchStack2 and FetchStack3: Precisely arranging multiple blocks in specified configurations.
- ManipulateBlock, ManipulateEgg and ManipulatePen: Orienting a block, egg-shaped or pen-shaped object.

The Fetch environment employs a 7-DoF robotic arm with a parallel gripper and the Shadow Dexterous Hand environment is based on a 24-DoF anthropomorphic robotic hand.

The [block stacking tasks](https://github.com/CDMCH/gym-fetch-stack) are built for [this paper](https://github.com/CDMCH/ddpg-curiosity-and-multi-criteria-her?tab=readme-ov-file) 
using the old [Gym](https://www.gymlibrary.dev/index.html) interface, while the other tasks are interfaces through [Gymnasium](https://robotics.farama.org).

## Installation

```bash
conda create --name robotics python=3.8
conda activate robotics
pip install -r requirements.txt
```
For stacking tasks, install [gym_fetch_stack](https://github.com/CDMCH/gym-fetch-stack).

## Demonstration
We aim to leverage demonstrations efficiently regardless of their quantity and quality. The demonstrations we used are 
contained in the folder _Demonstrations_, and are named by the environment. The demonstrations used to produce our main results and test the effects of demonstration quantity and quality are also included in the subfolders for reproducibility.
More details about our demonstration collection, demonstration quantity and quality can be found in our paper.

## Usage
Due to the different configurations in Gym and Gymnasium, we provide two test files. One for six standard tasks and one for two stacking tasks.

**For the standard tasks:**
```bash
python Standard_test.py --env FetchPickAndPlace-v2
```
**For block stacking tasks:** 
```bash
python Stacking_test.py --env FetchStack2SparseStage3-v1
```
The training and evaluation are completed by the single file. The results will be saved to corresponding paths.
You can use different environments, demonstrations and hyperparameters.

The methods contained here all use TD3, HER and ensemble:
- EnsQ-filter: The ensemble variant of Q-filter with binary imitation decisions.
- SPReD-P: The probabilistic variant of our proposed SPReD, which estimates the likelihood of demonstration superiority as the weight of imitation.
- SPReD-E: The exponential variant of our proposed SPReD, which calibrates imitation strength based on the statistical significance of advantages.
- Nonpara_pairwise and Nonpara_cross: Nonparametric probabilistic methods for the validation of Gaussian assumption in SPReD-P.

## Results
![main results](image.png)