
## Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
Rodrigo  Toro Icarte, Toryn Q.  Klassen, Richard Valenzano, Sheila A. McIlraith
Keywords: 
JAIR/2022/Proceedings/12440 - Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning.pdf
Project URL: 

### Implementation
_Given the documentation given by the authors on the method, how much time investment would it be to re-implement the method from scratch?_

[1]

The authors provide a link to their implementation in section 5.5 (https://github.com/RodrigoToroIcarte/reward_machines). In the readme they state an introduction to the method, installation instructions, how to run the code with a description of the parameters, a description of the included environments, how to run examples and where to find scripts to reproduce the results from the paper, how to export results, and some more code file descriptions. Code has okay comments.

### Data
_Given the data description in the documentation, how much effort take to either: Find the same dataset the authors used, or similar datasets and defend the comparability, or acquire one from scratch?_

[1]

(4/4)

The authors use simulated environments Office World, Craft World, Water World, HalfCheetah-v3 and provide descriptions, visualisations and citations on each. Each is included in the implementation installation.

### Configuration 
_Given the (hyper)parameters, including semantic parameters, of the method: How much effort would it take to acquire the algorithm configurations used for their results, and compare against their budgetary constraints?_

[2]

The authors provide the commands used to produce their experiments in the scripts folder per environment. They are also summarised in code files per environment in 'default.py' per implementation. The methods applied from other works have 'default values' from the original work. However not each acquisition is explained.

### Experimental Procedure
_Given the experimental set-up of the work, how difficult is it to set up a new experiment, similar to those presented in the original work, with the same procedure?_

[1]

The authors show the results averaged over 20/60 runs with 25th/75th percentile as variation. Metrics is average reward per step (environment) over training steps.

### Expertise
_How much effort would it take to acquire the expertise required to reproduce the work independently relying on the available documentation?_

[4]

Requires expertise in RL and reward machines.
