# README

## Introduction

This code implements a workflow for training a model using the CQL (Conservative Q-Learning) algorithm on the Pendulum environment and subsequently generating synthetic data through a diffusion model. The process involves policy training, data generation, data augmentation, and on-policy evaluation.

## Dependencies

The code relies on the following Python libraries:

- `d3rlpy`
- `torch`
- `numpy`
- `denoising_diffusion_pytorch`

## Usage

1. **Policy generation**
- A pre-trained policy can be generated by executing `policy_generator.py`. The generated policy will be saved as `cql_pretrained.d3`.

2. **True value approximation**
- To generate an approximated true value of the pretrained policy, please run
   
   ```python true.py 8000```.

   The approximated value will be saved in `truevalue.txt`.

3. **Reproduce results**

   With the previous two steps completed, to reproduce the experiment results, please run the following for one repetition

   ```python main.py 10 0 8000``` .
  
   For multiple repetition, one can simply execute the above task for multiple times. All results will be saved in `outputs_mVariations.txt`. The time inccured by training and data generation will be summarized in `TrainingTimeUsage.txt` and `GenerationTimeUsage.txt` respectively. 
