# Introduction
This directory contains the code for
- generating AnyMDP tasks 
- generating datasets or trajectories for training and validation
- training, and validation for AnyMDP tasks
- evaluate OmniRL model in Gymnasium and AnyMDP tasks

All codes will be open-source after the paper is accepted.

# 1. Generate AnyMDP tasks

Sample AnyMDP tasks with the following command:

```bash
 python ./data/gen_anymdp_task.py \
	 --output_path YOUR_TASK_PATH \   # file path to save tasks
	 --task_number \       # size of the task set
	 --n_states 16 \       # n_s
	 --n_actions 5       # n_a
```

Before doing so, make sure the directory is in your python path. You can add it by running:
```
export PYTHONPATH="$PYTHONPATH:PATH_TO_YOUR_DIRECTORY"
```

# 2. Generate datasets or trajectories for training and validation

After generating your tasks, you may generate datasets or trajectories from the generated tasks by the following command:

```bash
 python ./data/gen_anymdp_record.py \
	 --output_path YOUR_DATA_PATH \
	 --task_file YOUR_TASK_PATH \
	 --task_source FILE \   # FILE or NEW
	 --state_num 16 \       # n_s
	 --action_num 5 \       # n_a
	 --max_steps 16000 \    # Sequence length T
	 --epochs 256 \         # Number of sequences
	 --workers 64
```

If you are sampling random task for each sequence, use `task_source=NEW`.


# 3. Configuration and Training

Run the following command to train and validate OmniRL model on AnyMDP tasks:
## Training and Validation
```bash
python projects/OmniRL/train.py projects/OmniRL/config.yaml --configs key1=value1 key2=value2 ...
python projects/OmniRL/validate.py projects/OmniRL/config.yaml --configs key1=value1 key2=value2 ...
```

## Configuration

The `config.yaml` file contains all the necessary configuration for running OmniRL. Each configuration item is composed of multiple keys and sub-keys, can be over-written by commandline arguments. For instance, 
```yaml
model_config:
    state_encode:
        input_type: "Discrete"
```
can be over-written by commandline arguments as follows:
```bash
python train.py config.yaml --model_config.state_encode.input_type="Continuous"
```
Below we explain key configuration items in detail.

### General Configuration

- **run_name**:  # Names airsoul will use to discrimate the run from the others in the logs

- **master_port**: # A port used for connecting to the master node

- **load_model_path**: # Set to none in a cold start, or set to a path to load the model from a checkpoint

### Log Configuration

Specify the log path and whether to use tensorboard.

### Model Configuration (model_config)

Configuration for the overall model architecture and components, including encoders, decoders, and causal blocks. It defines the structure and behavior of the model during training and inference.

- **max_position_loss_weighting**: Defines the maximum sequence length that the model can handle.
- **context_warmup**: specify a increasing loss weighting with the context length, as shown in Appendices of [EPRNN](https://arxiv.org/pdf/2109.03554).
- **causal_block**:  Options include `Transformer`, `GSA`, `GLA`, `MAMBA`, `RWKV6`, `RWKV7`. OmniRL automatically use causal masks for `Transformer` and `RWKV6`, and employ a chunk-wise forward and backward pass. E.g., `Transformer` is automatically set to sliding window attention mode by setting train_config.seg_len.
- **state_encode**, **state_decode**, **action_encode**, ...: specify the encoder and decoder for states, actions, rewards etc.

#### Model structures
As for causal block config paramters, not all parameters are valid. For example, `num_hidden_layers` is used when calling `RWKV7PreTrainedModels`. Because we invoke `RWKV7Block` and construct the causal model externally with a block number of `num_layers`, `num_hidden_layers` is not used in this context. `hidden_ratio` is disabled when given actual `inner_hidden_size`. `position_encoding_size` is for `Transformer` only.
```yaml
causal_block:
    model_type: RWKV7
    num_layers: 18
    hidden_size: 512
    inner_hidden_size: 1024
    num_hidden_layers: 24
    dropout: 0.10
    nhead: 4
    hidden_ratio: 4
    position_encoding_size: 12000
    use_layer_norm: True
    use_blockrecurrence: True
    checkpoints_density: -1
    memory_length: 0
    memory_type: MEM
    is_frozen: False
```
The above setting refers to the causal model in our paper, with a hidden size of $512$, an inner hidden size of $1024$, and a block number of $18$. The next version of the code will remove redundant parameters to ensure simplicity.

### Training Configuration (train_config)

Settings for training the model.

- **seq_len**: Specify the sequence length loaded into the memory when training.
- **seg_len**: Specify the segment length used in chunk-wise forward and backward pass.
- **lr**, **lr_decay_interval**, **lr_start_step**: OmniRL apply noam decay with the warmup step specified by `lr_decay_interval`, use `lr_start_step` in cases of warm start.

### Test Configuration (test_config)

specify the configurations used for valiations between episodes or during static testing.

### Evaluation Configuration (generator_config)

specify the configurations for auto-regressive interaction with the environment during dynamic evaluation.

- **env**: #  anymdp32x5 / lake4x4 / cliff / mountaincar12x5 / pendulum12x5 / switch(multi-agent)
- **task_file**: # Use pre-defined tasks (pickle file generated by gen_xxx_task.py), only valid for AnyMDP task evaluation
- **action_clip**: # clip the action to fit the environment, e.g., if action space in OmniRL = 6, and action_clip = 4, action 4, 5 will be mapped into 0,1 in the environments
- **decoding_strategy**: # specify **T_ini**, **T_fin**, **T_step**, and **decay_type**, the temporature of decoding will decay from **T_ini** to **T_fin** after **T_step**

**Parameters not explicitly shown above may retain their default values as per the recommended configuration. Custom adjustments are available when aligned with specific application requirements.**

# 4. Evaluate OmniRL model in Gymnasium and AnyMDP tasks

Enable the model to interact with any discrete-space environments by 

```bash
python generator.py config_anymdp.yaml
python generator.py config_gymnasium.yaml
```

# Download pre-generated datasets, tasks and pre-trained models

- Training DataSet (Large) with 512K sequences and 6B time steps can be downloaded from [here](https://www.kaggle.com/datasets/anonymitynobody/omnirl-training-data-d-large).
- Online evaluating task set and static validation dataset for AnyMDP can be downloaded from [here](https://www.kaggle.com/datasets/anonymitynobody/omnirl-evaluation).
- Pre-trained models (D_{Large}) in can be downloaded from [here](https://www.kaggle.com/models/anonymitynobody/omnirl-pre-trained-model-d_large).
