# `transfer.py` - Transfer PPO with Guided Advice

This script runs PPO training with a pre-trained teacher agent providing **guided advice** to the student. It supports flexible advice scheduling and in-distribution (IID) thresholding based on energy levels.

## Usage

```bash
python main.py --config <config_name> --teacher_dir <teacher_checkpoint_path> [options]
```

## Arguments

| Argument                 | Type    | Default        | Description                                                                 |
|--------------------------|---------|----------------|-----------------------------------------------------------------------------|
| `--config`               | str     | `"pacman"`     | Name of the configuration to load (from `config/`).                         |
| `--teacher_dir`          | str     | _required_     | Directory containing the teacher's checkpoint and related data.             |
| `--check_num`            | str     | `"model-best"` | Which checkpoint file to load from `teacher_dir`.                           |
| `--total_frames`         | int     | `500000`       | Total number of training frames.                                            |
| `--num_transfer`         | int     | `-1`           | Number of frames to allow advice. `-1` means no limit.                      |
| `--q_th`                 | float   | `0.1`          | Quantile threshold for teacher's energy-based filtering.                    |
| `--follow_prob`          | float   | `1.0`          | Probability of following the teacher's advice.                              |
| `--linear_decay_advice`  | flag    | False          | If set, linearly decay advice over time.                                    |
| `--ex_decay_advice`      | int     | `-1`           | Exponential decay lambda for advice; `-1` disables exponential decay.       |
| `--limit_budget`         | flag    | False          | Limit how often advice is given (e.g., fixed budget).                       |
| `--random_IID`           | flag    | False          | Use randomly collected IID samples from teacher data.                       |
| `--fix_IID`              | flag    | False          | Use fixed IID energies stored as raw `.pt` files.                           |
| `--save_images`          | flag    | False          | Save sample visualizations during training.                                 |
| `--debug`                | flag    | False          | If set, disables wandb logging and avoids writing to disk.                  |

## Notes on Teacher Energy and IID

This script uses teacher-generated energies to determine when to follow advice. The energy threshold is computed using the quantile specified by `--q_th`. IID samples are loaded from files under the teacher directory, depending on whether `--fix_IID` or `--random_IID` is set.

- If `--fix_IID` is enabled, the file `iid_raw_energy/<check_num>.pt` is loaded.
- Otherwise, grouped or random samples are loaded and used to compute energy values.

The computed threshold determines whether the student receives advice based on the state’s energy score.

## Advice Scheduling Parameters in Config

To control how and when advice is given, you must specify the following fields in the `BASE_CONFIG` section of your YAML config file:

```yaml
BASE_CONFIG:
  advice_reset_interval: 5000
  interval_advice_rate: 0.125
```

These control:
- `advice_reset_interval`: how often the advice budget resets (in frames).
- `interval_advice_rate`: the ratio of timesteps within each interval where advice can be given.

For a working example, see `multi_grid_locked.yaml` in the `config/` directory.

## Checkpoints

All training checkpoints and logs will be saved in:

```
<teacher_dir>/ckpts/<auto_generated_exp_name>
```

The subdirectory name is automatically created based on config name, teacher checkpoint, and advice settings.
