# RICOL

* This repo is highly rely on previous code base [BALROG](https://github.com/balrog-ai/BALROG) and [LLaMA-Factory
](https://github.com/hiyouga/LLaMA-Factory)

* installation: please follow the instruction in LLaMA-Factory and BALROG

* Run RICOL (with 2xA40):
    * replace line 33 in `collect/collect_feedback.py` with your openai api_key
    * replace line in `BALROG/balrog/evaluator.py` with your absolute path to `collect/reflect2.txt`
    * follow the instruction in `LLaMA-Factory/examples/train.sbatch` to run the code
        * Home: make sure that your code is placed under: $Home/incontext_RL
        * HF_Token: replace this with your actual Hugging Face token
        * RESULT: we will use this folder for storing log file and checkpoint
        * PROMPT_DIR: replace this with the absolute path to `collect/reflect3.txt`
        * TASKNAME: support [BabyAI-MixedTrainLocal-v0/goto], [BabyAI-MixedTrainLocal-v0/pickup]", [BabyAI-MixedTrainLocal-v0/pick_up_seq_go_to]", [BabyAI-MixedTrainLocal-v0/open]"
    * the visualization result, win rate figure can be found in the $RESULT folder

* RUN RWR:
    * checkout to `RWR_final` branch and follow the instruction there