# Toy Example on Maze Environment

We use a toy example on the maze environment to illustrate the survial bias in offline RL.

- We first train online DQN/RND agents on this toy environment, and save checkpoints at different timesteps.

    ```bash
    python 1_train_online_agents.py
    ```

- We then use different saved checkpoints to collect trajectories to create an imbalanced offline dataset. In particular, we manually select checkpoints at different levels and build a dataset which is dominanted by noisy and sub-optimal trajectories.

    ```bash
    python 2_collect_traj.py
    ```
