# Language-conditioned Multi-Style Policies with Reinforcement Learning

Welcome to the official PyTorch implementation of the paper:

"Language-Conditioned Multi-Style Policies with Reinforcement Learning"

Here is an overview of the training architecture:

![System Architecture Diagram](images/Training Process.png)

This repository includes the code for both multi-style policy training and instruction generation. 

This branch is primarily used for experimental testing in the Highway environment.
[url:https://highway-env.farama.org/installation/].

## Requirements

### 1. Setup environment

This code was tested on Linux and requires the following:

* Python: 3.8
* Conda: Anaconda3 or Miniconda3
* GPU: CUDA-capable GPU
* Docker
* Nginx

To set up the Conda environment, run:

```
conda create -n lcmsp python==3.8
conda activate lcmsp
pip install -r requirements.txt
```

### 2. Download Installation Packages for Samplers
Navigate to the `lcmsp` directory and download the required PyTorch wheel packages:
```
cd lcmsp
wget https://download.pytorch.org/whl/cpu/torchaudio-2.3.0%2Bcpu-cp38-cp38-linux_x86_64.whl#sha256=424499caf711673302263c22422d3a6a9e37c918776205081315ef6870dafde2
wget https://download.pytorch.org/whl/cpu/torch-2.3.0%2Bcpu-cp38-cp38-linux_x86_64.whl#sha256=8c52484880d5fbe511cffc255dd34847ddeced3f94334c6bf7eb2b0445f10cb4
wget https://download.pytorch.org/whl/cpu/torchvision-0.18.0%2Bcpu-cp38-cp38-linux_x86_64.whl#sha256=d20a65db1c6821fd79b8b4678cd21c14c8d8c1a1fd0d5ae2b43528db568ea890
```

### 3. Build the Sampler Docker Image
Build the Docker image for the sampler by running:
```
docker build -t highway_sampler .
```

### 4. Configure Nginx for Model Transfer
To set up Nginx as a proxy for model transfer, please follow these steps:

#### Edit the Nginx Configuration File:

Open the Nginx configuration file using your preferred text editor. Here, we use vim:
```
sudo vim /etc/nginx/nginx.conf
```

#### Modify the Server Configuration:
Within the configuration file, add or modify the following server block:

```
server {
    listen       80;
    server_name  your_machine_ip;

    location / {
        root   /home/distributed_swap;
        autoindex on;
    }
}
```

Note: Replace your_machine_ip with the actual IP address of your machine.

#### Restart the Nginx Service:

After saving the changes to the configuration file, restart the Nginx service to apply the updates:

```
sudo systemctl restart nginx
```

This configuration sets up Nginx to serve the models from the /home/distributed_swap directory over HTTP, 
allowing other components of your system to access the models via the specified IP address.


## Multi-style Policy Training

The framework for multi-style policy training consists of two main components:

* Learner: Responsible for training tasks.
* Sampler: Responsible for data generation.

The sampler and learner share data structure definitions found in `config.json` and `worker/instance.py`.

### 1. Start Learner
Before running the learner, 
update `config.json` by replacing `your_machine_ip` with the actual IP address of your training machine,
For example:
```
    ...
    "log_server_address": "your_machine_ip",
    "log_server_port": 8100,

    "config_server_address": "your_machine_ip",
    "config_server_request_model_port": 9000,
    "config_server_model_update_port": 9001,
    "config_server_hot_update_port": 9002,
    ...
```


Start the servers for the learner by executing:
```
python start_server.py
```

Server Descriptions:

* data_server: Each learner server process has a corresponding number of data servers (data_server_to_learner_num). 
These servers are responsible for receiving data from the sampler, maintaining a local data pool, 
and regularly sampling data to place into Plasma (an in-memory object store) for use by the learner server.

* learner_server: Polls Plasma for ready data and performs training.
The learner server with global rank=0 is also responsible for periodically updating the model to the config server.

* config_server: Handles all sampler requests for the current latest model.

* log_server: Collects global logs from the sampler and each server, displaying them via TensorBoard.

After training is complete, you can terminate all learner servers by running:
```
python kill_server.py
```

Note: This command will terminate all Python processes associated with the servers. 
If your GPU is also running other Python tasks simultaneously, 
please exercise caution to avoid terminating unintended processes.


### Start Sampler
#### Option A: Using a Cluster
If you have access to a computing cluster that can run multiple Docker containers concurrently, 
we recommend starting the sampler in the cluster environment for optimal performance.

#### Option B: Running on a Single Machine
If a cluster is not available, you can start multiple Docker containers on the same machine to serve as samplers. 
Please be aware that running the sampler on a single machine will greatly reduce training speed due to the decreased amount of data compared to a cluster setup.

To start the sampler on a single machine, follow these steps:

##### 1. Run the Docker container:
```
docker run --name sampler -p 30000:30000 -itd highway_sampler
docker exec -it sampler sh
```

##### 2. Inside the Docker environment, start the sampler by executing:
```
python3 worker/sampler.py
```
The sampler will now continuously run and send data to the learner for training.


## Instruction Generation

By default, this code utilizes GPT-4o to generate instructions. 
To enable this functionality, you need to provide your own authorization key. 

Please add your key to the completion function in `language_control/llm.py` by modifying the following section:

```
    headers = {
        "Content-Type": "application/json",
        "Authorization": "your authorization",
    }
```

Replace `your authorization` with your actual API key or authorization token.

After configuring your authorization, you can generate Normal type instructions for the Highway environment by running:

```
python language_control/highway/natural_language_instruction_generate.py
```

If you wish to generate instructions of other types, 
simply modify the prompt in `language_control/highway/prompt_en.py` to suit your needs.
