# In-Context-Preference-Based-RL
This is the repo for the code and experiments for in-context preference-based reinforcement learning

The code needs further cleaning, yet we have been trying to make the code as modular as possible.

The code is structed in such a way:

1. ./algos includes the implementation for transformer models
2. ./ctrls includes the class of a controller that works as interface between a model from ./algos and an environment
3. ./envs includes the implementation for darkroom env, it is copied from the DPT codebase
4. ./policy_conf and ./reward_conf are configuration files that are consistent with Hydra, a configuration management library
5. ./trainers containes the code that is related for training a model, includeing traniners, dataset class and all loss functions.
    5a) ./trainers/dataset.py containes the implementation for a dataset class that is consistent with PyTorch dataset class
    5b) ./trainers/losses.py are simply losses to train the models
    5c) ./trainers/policy_model_trainer.py, ./trainers/reward_model_trainer.py and ./trainers/policy_model_trainer.py ./trainers/value_model_trainer.py contains the code for trainer that train the corrsponding models

All other scripts are simply either using classes from the above folders or work as utils to support the code. 