
# Efficient Action-Constrained Reinforcement Learning via Acceptance-Rejection Method and Augmented MDPs

This repository provides a framework for training ACRL algorithms with augmented Acceptance-Rejection Method and Augmented MDPs.

### Main Files and Scripts
- **`main.py`**: The primary script to initiate training. It allows for various configurations like `augment_ratio`, `augment_ratio_decay`, and `augment_ratio_decay_freq`.

- **`main_sosac.py`**: A variant of the training script that implements the SOSAC algorithm.

- **`model.py`**: Contains model definitions for the agents used in the training.

- **`agent.py`**: Defines the agent's behavior and interactions with the environment.

- **`agentsosac.py`**: Implements the agent for the SOSAC variant.

- **`Constraint_Check.py`**: A script for checking action constraints in the environment.

- **`Constraint_Proj.py`**: Likely performs constraint projection operations to ensure actions remain within specified bounds.

### Environment and Configuration
- **`environments/`**: A folder containing environment-related files, providing a range of custom or predefined environments for training.

- **`requirements.txt`**: Lists the dependencies needed to run the project. Install these using `pip install -r requirements.txt`.

### Additional Files
- **`base.py`**: Provides base classes or functions that are extended or used in other parts of the project.
  
- **`multi_step.py`**: Provides functions to handle multi-step processes during training, likely for managing complex action sequences or state transitions.

- **`utils.py`**: Contains utility functions used throughout the project for various supporting operations.

## Getting Started

To start training, use the following command:

```bash
python main.py --prob_id H+M_10 --env_id MO_hopper_M_10_goal_vel3_ccw0001-v0 --augment_ratio 0.2 --augment_ratio_decay 0.99 --augment_ratio_decay_freq 10000 --seed 0
```

### Parameters

- **`augment_ratio`**: Specifies the proportion of AUTO-MDPs to sample during training. For example, setting `--augment_ratio 0.2` means that 20% of the training data will be drawn from AUTO-MDPs.

- **`augment_ratio_decay`**: Represents the discount factor for the `augment_ratio`. It controls how the augmentation ratio decreases over time. For example, setting `--augment_ratio_decay 0.99` reduces the augmentation ratio by 1% each decay cycle.

- **`augment_ratio_decay_freq`**: Defines the frequency of decay cycles for the `augment_ratio`. For instance, `--augment_ratio_decay_freq 10000` will apply the decay every 10,000 steps.

