# `main.py` - PPO Training Entry Point

This script launches PPO training with configurable settings and optional energy-based regularization using in-distribution (ID) and out-of-distribution (OOD) samples.

## Usage

```bash
python main.py --config <config_name> [options]
```

## Arguments

| Argument           | Type    | Default           | Description                                                                 |
|--------------------|---------|-------------------|-----------------------------------------------------------------------------|
| `--config`         | str     | `"metaworld_indep"` | Name of the configuration to load (from `config/`).                         |
| `--total_frames`   | int     | `1000000`         | Total number of frames to train the agent.                                  |
| `--test_interval`  | int     | `100`             | Number of test evaluations over the course of training.                     |
| `--debug`          | flag    | False             | If set, disables wandb logging and file saving.                             |
| `--save_images`    | flag    | False             | Saves sample visualizations during training.                                |
| `--archive_buffer` | flag    | False             | Enables an extra replay buffer for IID data. When this is enabled, `--id_file` is ignored. |
| `--id_file`        | str     | `None`            | Path to a `.pt` file containing in-distribution feature maps (ignored if `--archive_buffer` is used). |
| `--ood_file`       | str     | `None`            | Path to a `.pt` file containing out-of-distribution feature maps.           |

## Notes on `--id_file` and `--ood_file`

Both files must contain a 4D tensor of shape:

```
[batch_size, C, X, Y]
```

Where:
- `C` is the number of feature channels (e.g., `3` in GridWorld),
- `X` and `Y` are the spatial dimensions of the environment encoding (e.g., grid size),
- `batch_size` is the number of stored ID/OOD samples.

These are used by the PPO loss module to enforce margin-based energy regularization. If `--archive_buffer` is enabled, `--id_file` is not used, and the archive buffer provides the in-distribution samples instead.

## Enabling Energy-based Regularization

To use energy-based loss regularization, make sure the following parameters are specified under `BASE_CONFIG` in your config YAML file:

## Enabling Energy-based Regularization

To use energy-based loss regularization, make sure the following parameters are specified under `BASE_CONFIG` in your config YAML file:

```yaml
BASE_CONFIG:
  margin_in: 12
  margin_out: 14
  lambda_energy: 0.0001
  archive_buffer_size: 3000
```

These control:
- `margin_in`: energy threshold for in-distribution samples.
- `margin_out`: energy threshold for out-of-distribution samples.
- `lambda_energy`: loss coefficient for the energy-based regularization term.
- `archive_buffer_size`: number of IID samples to store during training (used when `--archive_buffer` is enabled).

For a working example, see `multi_grid.yaml` in the `config/` directory.


## Checkpoints

All training checkpoints and logs are saved under the directory:

```
./ckpts
```

Each run is saved in a subdirectory named after the experiment name, which is automatically generated from the config.
