# Local Reinforcement Learning with Action-Conditioned Root Mean Squared Q-Functions

## Abstract

The Forward-Forward (FF) Algorithm is a recently proposed learning procedure for neural networks that employs two forward passes instead of the traditional forward and backward passes used in backpropagation. However, FF remains largely confined to supervised settings, leaving a gap at domains where learning signals can be yielded more naturally such as RL. In this work, inspired by FF's goodness function using layer activity statistics, we introduce Action-conditioned Root mean squared Q-Functions (ARQ), a novel value estimation method that applies a goodness function and action conditioning for local RL using temporal difference learning. Despite its simplicity and biological grounding, our approach achieves superior performance compared to state-of-the-art local backprop-free RL methods in the MinAtar and the DeepMind Control Suite benchmarks, while also outperforming algorithms trained with backpropagation on most tasks.

## Repository Structure

- `cb1/` - General implementation for all tasks
- `cb2/` - Specialized implementation for MinAtar/Seaquest-v1 and MinAtar/Asterix-v1 only
- `dmc2gym/` - DeepMind Control Suite to Gymnasium adapter

## Environment Setup

Create and activate the conda environment:

```bash
conda create -n arq python=3.10
conda activate arq
pip install poetry
poetry install
```

For DeepMind Control Suite tasks, install `xvfb`:
```bash
sudo apt-get install xvfb  # Ubuntu/Debian
```

## Supported Environments

### MinAtar Tasks
- `MinAtar/Breakout-v1`
- `MinAtar/Freeway-v1`
- `MinAtar/SpaceInvaders-v1`
- `MinAtar/Seaquest-v1`
- `MinAtar/Asterix-v1`

### DeepMind Control Suite Tasks
- `walker` (walker walk)
- `runner` (walker run)
- `hopper` (hopper hop)
- `cheetah` (cheetah run)
- `reacher_hard` (reacher hard)

## Training

### Using cb1/ (General Tasks)

For MinAtar tasks:
```bash
cd cb1 && poetry run python scripts/train.py <ENV_ID> --seed=<SEED>
```

For DMC tasks:
```bash
cd cb1 && xvfb-run python scripts/train.py <ENV_ID> --seed=<SEED>
```

### Using cb2/ (MinAtar/Seaquest-v1 and MinAtar/Asterix-v1 only)

```bash
cd cb2 && python3 scripts/train.py --task <ENV_ID> --seed <SEED>
```

### Examples

```bash
# MinAtar with cb1/
cd cb1 && poetry run python scripts/train.py MinAtar/Freeway-v1 --seed=42

# DMC with cb1/
cd cb1 && xvfb-run python scripts/train.py walker --seed=42

# MinAtar with cb2/ (Seaquest or Asterix only)
cd cb2 && python3 scripts/train.py --task MinAtar/Seaquest-v1 --seed 42
```
