# FPVR-DQN (Atari): Future-Past Visitation Redundancy with Experience Replay

This module implements **FPVR (Future-Past Visitation Redundancy)** as an **off-policy / replay-buffer** algorithm,
integrated into a **DQN-style** training loop for Atari environments.

## Core mechanism (aligned with the paper)

- **Future representation (slow time-scale)**: successor features \( \psi(s,a) \) (action-conditioned SR), learned via TD to approximate the expectation.
- **Past representation (short time-scale)**: persistence representation \( c \), a discounted accumulator of whitened features \( \tilde{\phi} \):
  \[
  c \leftarrow \lambda c + \tilde{\phi}(s_t)
  \]
- **FPVR score (future–past redundancy)**: cosine similarity between \( \psi(s,a) \) and \( c \); actions are selected to prefer **lower redundancy**.

## Project structure

```
fpvr_dqn_atari/
├── config.py           # FPVR-DQN args (single source of truth for CLI defaults)
├── model.py            # FPVR network (encoder -> φ -> ψ(s,a)) + Q-network
├── replay_buffer.py    # replay buffer (optional PER / mixed-MC)
├── agent.py            # FPVR agent (ZCA whitening, c update, redundancy, SR/Q training)
├── main.py             # training entrypoint (supports sequential multi-seed runs)
├── evaluate.py         # evaluation (Q-network only)
└── README.md           # this file
```

## Quickstart

After installing dependencies (see `requirements.txt`), run training (example: Breakout):

```bash
python fpvr_dqn_atari/main.py --env_name ALE/Breakout-v5
```

## Key arguments (kept consistent with `config.py`)

### FPVR core

- `--fpvr_lambda_c`: \( \lambda \), decay factor for the persistence representation \(c\)
- `--phi_dim`: feature dimension for \( \phi \)
- `--sf_gamma`: successor-feature discount factor
- `--sf_target`: SR target policy (`uniform_policy` or `min_redundancy`)
- `--sr_coeff`: SR loss coefficient

### Whitening (ZCA)

- `--whitening_update_every`, `--whitening_ema_alpha`, `--whitening_eps`, `--cov_buffer`

### Replay buffer / PER

- `--buffer_size`, `--prioritized_replay`, `--prioritized_alpha`, `--prioritized_beta`

### Q-learning (DQN / DDQN)

- `--dqn_type`, `--q_lr`, `--q_gamma`, `--q_target_update`, `--q_net_type`

### Action selection (combining Q and redundancy)

- `--policy_type`, `--policy_alpha`, `--q_abs_threshold`

## Outputs

Training outputs are saved under:

- `fpvr_dqn_atari/runs/<timestamp>-<env_tag>/seed<seed>-<timestamp>/...`

Typically including `config.json`, `checkpoint*.pth`, TensorBoard logs, and optional GIFs.

## References

- DQN: Mnih et al., 2015
- Successor Representation: Dayan, 1993

