# In-Context-Preference-Based-RL
This is the repo for the code and experiments for in-context preference-based reinforcement learning

The code needs further cleaning, yet we have been trying to make the code as modular as possible. 
The code structure is no difference from that of darkroom, but is simply changed at several positions to accomodate the tasks from metaworld.

The code is structed in such a way:

1. ./algos includes the implementation for transformer models
2. ./ctrls includes the class of a controller that works as interface between a model from ./algos and an environment
3. ./envs includes the implementation for darkroom env, it is copied from the DPT codebase
4. ./policy_conf and ./reward_conf are configuration files that are consistent with Hydra, a configuration management library
5. ./trainers containes the code that is related for training a model, includeing traniners, dataset class and all loss functions.
    5a) ./trainers/dataset.py containes the implementation for a dataset class that is consistent with PyTorch dataset class
    5b) ./trainers/losses.py are simply losses to train the models
    5c) ./trainers/policy_model_trainer.py, ./trainers/reward_model_trainer.py and ./trainers/policy_model_trainer.py ./trainers/value_model_trainer.py contains the code for trainer that train the corrsponding models

All other scripts are simply either using classes from the above folders or work as utils to support the code. 

Specifically, metaworld_trajectory_generation.py generate preference trajectories from a pre-collected trajectory dataset. 
Scripts with _metaworld in the names are used for metaworld tasks.