### Feasible Policy Optimization

This project implements the feasible policy optimization (FPO) algorithm based on the [OmniSafe](https://github.com/PKU-Alignment/omnisafe) framework.

#### Files we implemented

Below are the main files we added or modified for the implementation of the FPO algorithm:

**Algorithm Logic:**

- `fpo.py`: The core implementation of the FPO algorithm.

**Adapter:**

- `fpo_adapter.py`: An environment adapter specific to the FPO algorithm.

**Buffer:**

- `fpo_buffer.py`: The experience replay buffer used by the FPO algorithm.
- `vector_fpo_buffer.py`: An FPO buffer suitable for vectorized environments.

**Model:**

- `fpo_actor_critic.py`: An Actor-Critic model specialized for the FPO algorithm, which includes a feasibility critic.

**Config:**

- `FPO.yaml`: The default configuration file for the FPO algorithm.

**Visualize:**

- `visualize.py`: An example script for visualizing the training results of the FPO algorithm.

**Evaluate:**

- `evaluator.py`: Redesigned the `render` and `collect_obs` methods.

#### Usage

The usage of the FPO algorithm is largely consistent with other algorithms within the OmniSafe framework. You can refer to OmniSafe's official documentation and README.md file to understand the general training, evaluation, and visualization processes. You can refer to `example/train_policy.sh` for training, and use `example/visualize.py` and `example/evaluate_saved_policy.py` for visualization and evaluation.