===== ABOUT =====

This zip file contains Python code necessary to reproduce all the experiments in the NeurIPS 2021 paper "Risk-Aware Transfer Learning using Successor Features".

Two versions of the framework are available:
1. online version: trains one task at a time, adds it to the policy library then moves on
2. offline version: training tasks are defined in advance, collects data from each task sequentially but trains policies for all other tasks as well

The training procedures above allow us to build upon the experimental design used in Barreto et al., 2017 (the work on risk-neutral transfer using successor features).

===== INSTRUCTIONS TO RUN =====

To run the motivating example:
	
	python main_example.py 

To run the four-room experiment:

	python main_shapes.py <trial> <agent> <penalty> 

where:
<trial> is the trial number to run in the range [0, 29]
<agent> is the name of the baseline to run in [sfql, prql]
<penalty> is a floating point value of the (negative) risk-aversion parameter beta
The hyper-parameter settings for the agents can be configured in configs/gridworld.cfg.

The task instances are pre-randomized and fixed on each run. If you wish to reset them, simply run:

	python generate_shapes_tasks.py

To run the reacher experiment:

	python main_reacher.py <agent> <approx> <penalty> 

where:
<agent> is the name of the baseline to run [sfc51, sfdqn, base, uvfa]
<approx> is the distributional assumption [gauss, laplace]
<penalty> is a floating point value of the (negative) risk-aversion parameter beta
The hyper-parameter settings for the agents can be configured in configs/reacher.cfg.

