# JoinGym

An efficient and lightweight query optimization environment for reinforcement learning (RL). 

## Quick Start
First install [Gymnasium](https://gymnasium.farama.org/), [PyTorch](https://pytorch.org/), [cpprb](https://ymd_h.gitlab.io/cpprb/api/), [wandb](https://docs.wandb.ai/quickstart). 
Then, to install JoinGym, run
```
cd join_optimization
pip install -e .
```

JoinGym follows the Gymnasium API, and implements two key methods.
<ol>
  <li> 
  state, info = env.reset(options={query_id=x})
  </li>
  <li>
  next_state, reward, done, _, info = env.step(action)
  </li>
</ol>
Moreover, info['action_mask'] is a multi-hot encoding (i.e., MultiBinary) of the possible actions at the h-th step. The RL algorithm should make use of this information to learn and act only from valid actions. 
We include implementations of DQN, PPO, SAC and TD3, which were used for producing our results.
For example, to run PPO on the left-deep environment and without Cartesian Products (CP), 

```
python test_ppo.py --disable-cartesian-product
```

To run SAC on the bushy environment and with CPs

```
python test_sac.py --enable-bushy
```


## Offline RL

To collect rollouts for offline RL agents

```
cd tests
python generate_rollout.py

```
To train offline RL agents with collected data, do

```
cd offline-rl
python convert_trajectories.py --algo=DiscreteBCQ --postfix_exp='_fullmodel_fulldata_6critic.pt'
```

You can also customize the training procedure. For example
- Only training on a subset of the collected data; 
- Design models under `offline-rl/models.py`. For example,
  - Changing model architecture
  - Changing different number of critics
  - Adding dropouts, layer norms, etc. 

To evaluate the trained offline RL agents online, 

```
cd offline-rl
python test_offline.py --model='models/model_DiscreteBCQ_fullmodel_fulldata_6critic.pt'   --algo=DiscreteBCQ
```

