# Meta-Referential Games to Learn Compositional Learning Behaviours

Human beings use compositionality to generalise from past to novel experiences,
assuming that past experiences can be decomposed into fundamental atomic com-
ponents that can be recombined in novel ways. We frame this as the ability to learn
to generalise compositionally, and refer to behaviours making use of this ability as
compositional learning behaviours (CLBs). Learning CLBs requires the resolution
of a binding problem (BP). While it is another feat of intelligence that human beings
perform with ease, it is not the case for artificial agents. Thus, in order to build arti-
ficial agents able to collaborate with human beings, we develop a novel benchmark
to investigate agents’ abilities to exhibit CLBs by solving a domain-agnostic ver-
sion of the BP. Taking inspiration from the Emergent Communication, we propose
a meta-learning extension of referential games, entitled Meta-Referential Games,
to support our benchmark, the Symbolic Behaviour Benchmark (S2B). Baseline
results and error analysis show that the S2B is a compelling challenge that we hope
will spur the research community to develop more capable artificial agents.


The following details how to reproduce the main experiments of the paper.


## Installation :

### Regym :

```bash
cd Experiments/thirdparties/Regym; pip install -e .
```

### ReferentialGym :

```bash
cd Experiments/thirdparties/Regym/regym/thirdparty/ReferentialGym; pip install -e .
```

### Archi :

```bash
cd Experiments/thirdparties/Regym/regym/thirdparty/Archi; pip install -e .
```

### Symbolic Behaviour Benchmark :

```bash
cd SymbolicBehaviourBenchmark; pip install -e .
```

### Miscellianeous :

```bash
pip install wandb ipdb
```

## Reproduce Experiments :

All relevant scripts can be found in the Experiments/scripts folder.
Each script launchs a single agent.
Please update the `--seed` hyperparameter to run each agent with different random seeds.

Logging is performed via Weights & Biases, thus you will be required to log in.


### Multi-Agent RL (Section 4.1):

The LSTM agent is run with the following command:

```bash
cd Experiments/scripts; ./run_lstm+org_marl.sh
```

Please update the `--sampling_strategy` argument within the script to change the number of shots $S$.
E.g. : `--sampling_strategy=component-focused-2shots` for $S=2$.

The column corresponding to the Posdis Speaker (PS) is measuring by running the following command:

```bash
cd Experiments/scripts; ./run_EoA_PS.sh
```


### Listener-Focused Single-Agent RL (Section 4.2):

The LSTM agent is run with the following command:

```bash
cd Experiments/scripts; ./run_lstm_rl.sh
```

The ESBN agent is run with the following command:

```bash
cd Experiments/scripts; ./run_esbn_rl.sh
```

The DCEM agent is run with the following command:

```bash
cd Experiments/scripts; ./run_dcem_rl.sh
```

In order to replicate the results of Table 3 with 5M sampling budget, please update the `--sampling_strategy` argument as explained previously to update the number of shots $S$, and the `--nbr_object_centric_samples` argument to update the number of object-centric samples $O$, with the following script and run the following command:

```bash
cd Experiments/scripts; ./run_lstm_rl_5M.sh
```


