`outputs` folder contains the outputs of evaluated models. The folder hierarchy is <evaluated_model>/F=<feedback_model>/<setting>/<task_type>/<task_name>.

Example setting: max5_p2+tool+cd corresponds to k=5 turns.

`processed` folder contains the processed dataset of our benchmark.
