# Learning Across the Noise Spectrum: An Approach to Reinforcement Learning with Asynchronous Signals

Official repository of the work "Learning Across the Noise Spectrum: An Approach to Reinforcement Learning with Asynchronous Signals". It implements LANS and sorrounding tools to test in simulated RL environments with asynchronous signals.

## Initialization

The repository makes use of the [Pixi](https://prefix.dev/) package manager. The first step to run the code is thus to install pixi following the instructions in the link above. If on Linux or macOS, run the following command:

```bash
curl -fsSL https://pixi.sh/install.sh | bash
```

Once the installation is complete (restarting the terminal may be needed for it to take effect), run the following command to intall all the necessary dependencies:

```bash
pixi install
```
---

Before running experiments, a personal configuration file needs to be created. It stores private information (or lack thereof). Run the following command:

```bash
touch config.personal.yaml
```

The file defines settings that are specific to the user and should not be revealed in a public codebase. Fill the file with the following content:

```yaml
wandb:
	entity: [Your Weights & Biases entity]
```

## Running Experiments

The repository relies on the following frameworks:
* [PyTorch](https://pytorch.org/) and [TorchRL](https://pytorch.org/rl/stable/index.html) for the implementation of architectures and algorithms in reinforcement learning.
* [Gymnasium](https://gymnasium.farama.org/) for defining and representing RL environments, especially [MuJoCo](https://gymnasium.farama.org/environments/mujoco/) ones.
* [Hydra](https://hydra.cc/) for experiments' configuration.

Configurations are defined in the `conf` directory, organized by experiment (in the `conf/experiment` subdirectory).

[Weights & Biases](https://wandb.ai/site) is an optional framework for logging results. It is enabled by default; to disable it, change this line in the `conf/base.yaml` file:

```yaml
wandb:
	enable: false  # By default it is true
```

If disabled, fill the entity key in the personal configuration file with any string as it does not have any effect.

Finally, experiments can be run with the following command:

```bash
pixi r train \
	experiment=[experiment_to_run] \
	device=[device_to_use] \
	# name_suffix=[optional_name_suffix]
```

Or by calling the `train.py` script directly:

```bash
pixi r python train.py \
	experiment=[experiment_to_run] \
	device=[device_to_use] \
	# name_suffix=[optional_name_suffix]
```

For example:

```bash
pixi r train experiment=async/halfcheetah device=0
```

Or:

```bash
pixi r python train.py experiment=async/halfcheetah device=0
```

Here is a breakdown of the command's arguments:
* The `experiment` argument defines which experiment to run depending on respective file in `conf/experiment`. The passed argument must be the name of one such file without the `.yaml` extension. To define new experiments, create new files in the directory with the desired settings.
* The `device` argument defines which device to use. It can be either `"cpu"` or a number to choose a specific GPU to use.
* The optional `name_suffix` argument defines a suffix to append to the experiment's name, which is then used to identify the run in Weights & Biases. The experiment's name is automatically inferred by joining the name of the environment with the algorithm used (e.g., `"hopper-sac"`). When providing a name suffix, the additional suffix is appended to the name (e.g., `"hopper-sac-lowlearningrate"`).

## Asynchronous Environments

The `conf/experiment` directory is itself divided into two subdirectories: `standard` and `async`. The experiments in `conf/experiment/standard` are run on the default MuJoCo environments. The experiments in `conf/experiment/async` are instead run on transformed MuJoCo environments to simulate asynchronous signals.

The code defining the simulated asynchronous environments can be found in `src/async_rl/amdp` for the general logic, and `src/async_rl/async_mujoco` for its application to the MuJoCo environments.

Notice that, as of now, only simulated Hopper, Halfcheetah, Walker2D, and Reacher are registered. New ones can be easily created by updating the `src/async_rl/async_mujoco/envs.py` and `src/async_rl/async_mujoco/__init__.py` files appropriately.