# Reinforcement Learning with GP Sampling

A PyTorch implementation of the paper entitled No-Regret Thompson Sampling for Finite-Horizon Markov Decision Processes with Gaussian Processes. The experiments provided support the theoretical regret bounds on Thompson sampling presented in the paper for RL in finite-horizon MDPs.

# Instructions

## Installation

Install dependencies with these commands:

```sh
conda env create -f environment.yml
conda activate ts_env
```

## Training

Hyperparameters including the environment name, kernel type, horizon length, etc are configurable in ```config.yaml```.

Run the training script with:
```sh
python train.py
```