# Memory-Augmented IRL on Mouse Labyrinth: Reproducibility Code

This repository contains the code to reproduce the core results from the paper. We show that a GRU-based behavior-cloning policy trained on mouse maze-navigation trajectories learns internal representations that mirror the tree structure of the maze, without any explicit spatial supervision, and substantially outperforms memoryless baselines under genuine state aliasing.

## Data Dependency

Most experiments use the **Rosenberg et al. (2021)** dataset of mouse maze trajectories. You must clone the companion repository and place it at the expected path:

```
git clone https://github.com/DavidRosenberg/MouseLabyrinth Rosenberg-2021-Repository
```

The expected layout (relative to this repo root) is:

```
memory-irl-polished/
├── Rosenberg-2021-Repository/
│   ├── outdata - tf files only/   # pickle files: B1.pkl, B2.pkl, ..., C9.pkl
│   └── code/                      # MM_Traj_Utils.py and related utilities
├── src/
├── scripts/
└── ...
```

`src/rosenberg_data.py` hard-codes this path and will raise an informative error if the data directory is missing.

The radial arm maze and Banino experiments are self-contained and do not require the Rosenberg dataset.

## Setup

```bash
conda create -n structure_analysis python=3.11 -y
conda activate structure_analysis
pip install torch
pip install -r requirements.txt
mkdir -p checkpoints
```

## Reproducing Paper Results

All scripts are run from the repo root with `python scripts/<script>.py`. Outputs are saved to `checkpoints/`.

`bayesian_filter_baseline.py` should be run first (dependency for `run_bayes_gap_ci.py` and `run_disagreement.py`). `run_structural_obs.py` trains the GRU and MLP models whose checkpoints are used by `analyze_structural_hidden.py`. All other scripts are independent.

| Paper Result | Script | Output |
|---|---|---|
| Table 1: GRU vs MLP LL and accuracy | `run_structural_obs.py` | `checkpoints/structural_obs_ablation.pt` |
| Fig. 2: Hidden-state geometry and depth probing | `analyze_structural_hidden.py` | `checkpoints/structural_hidden_analysis.pt` |
| Fig. 3: Probe baselines (inherited vs learned) | `run_probe_baselines.py` | printed to stdout |
| Fig. 4: Training dynamics | `run_training_dynamics.py` | `checkpoints/training_dynamics_results.json` |
| Fig. 5: Action probe | `run_action_probe.py` | `checkpoints/action_probe.pt` |
| Table 2: Architecture comparison | `run_arch_baselines.py` | printed to stdout |
| GRU vs Bayes gap (CI) | `run_bayes_gap_ci.py` + `bayesian_filter_baseline.py` | `checkpoints/bayes_gap_ci.pt` |
| T-maze IRL | `run_tmaze.py` | `checkpoints/tmaze_results.pt` |
| Encoding-swap control | `run_random_encoding.py` | printed to stdout |
| Dimension sweep | `run_dim_sweep.py` | `checkpoints/dim_sweep_d<N>.pt` per dimension |
| Temporal formation | `run_temporal_formation.py` | `checkpoints/temporal_formation.pt` |
| RSA | `run_rsa.py` | printed to stdout |
| GRU vs Bayes contingency | `run_disagreement.py` | `checkpoints/disagreement_analysis.pt` |
| Linear vs nonlinear probe | `run_nonlinear_probe.py` | `checkpoints/nonlinear_probe_results.json` |
| Full stats protocol | `run_statistical_tests.py` | `checkpoints/statistical_tests.pt` |
| Leave-one-out generalization | `run_leave_one_out.py` | printed to stdout |
| PCA ablation | `ablate_pca.py` | `checkpoints/pca_ablation.pt` |
| Per-node accuracy | `ablation_per_node.py` | `checkpoints/ablation_per_node.pt` |
| Radial arm maze: GRU vs MLP | `run_radial_arm.py` | `checkpoints/radial_arm_ablation.pt` |
| Radial arm maze figures | `plot_radial_arm.py` | `figures/radial_arm_*.png` |
| Banino untrained baseline | `run_banino_untrained.py` | `checkpoints/banino_untrained/results.json` |
| Banino wall-distance encoding | `run_banino_wall_distance.py` | `checkpoints/banino_wall_distance/results.json` |
| Water-unrestricted ablation | `run_ablation_unrestricted.py` | `checkpoints/ablation_unrestricted.pt` |
