# Sample Efficient Reward Augmentation (SERA) in  offline-to-online Reinforcement Learning

Introduction: This is the official implementation of Sample Efficient Reward Augmentation (SERA).


## Environment
|Algorithm|Version|
|:---|:---|
|python|3.10.11|
|JAX|0.4.1|
|Gym-mujoco|0.23.1|
|D4RL|1.1|
## Computing Resources
|platform|type|
|:---|:---|
|system|Linux|
|GPU|NIVIDIA-A100/V-100|
## Tasks and datasets
[d4rl][https://github.com/Farama-Foundation/D4RL]
## Baselines and Algorithms
|Algorithm|tasks|
|:---|:---|
|CQL-SERA|gym|
|CalQL-SERA|gym|
|CQL|gym|
|CalQL|gym|
|TD3+BC-SERA|gym|
|TD3+BC|gym|
Other algorithms will be avaliable in this project soon:

|Algorithm|tasks|
|:---|:---|
|IQL-SERA|gym|
|IQL|gym|
|AWAC-SERA|gym|
|AWAC|gym|
Notably, antmaze tasks can validate the capability of algorithm nessisting explorary and stitching
gym tasks validate teh algorithm on high dimisional and contiuous control tasks.
## training examples
    ```
    # training Cal-QL-SERA on gym tasks
    sh ./scripts/run_gym_vcse.sh halfcheetah-medium-v2 0 0 100 
    
    # training Cal-QL on gym tasks
    sh ./scripts/run_gym.sh  walker2d-medium-v2 0 2

    # training CQL on gym tasks
    sh ./scripts/run_gym_cql.sh halfcheetah-medium-v2 0 0

    # trining CQL-SERA on gym tasks
    sh ./scripts/run_gym_cql_vcse.sh hopper-medium-v2 0 0 50

    # training TD3-BC on gym:
    sh ./scripts/td3_bc_wo_cql.sh halfcheetah-medium-v2 0 0
    
    # training TD3-BC-SERA on gym:
    sh ./scripts/gym_td3bc_vcse_main.sh hopper-medium-v2 0 0 10
    ```