# Introduction
We provide the code of baseline and full-duplex dialogue system evaluation process and demo deployment. It mainly contains two parts of code:
* How to run the test process in a simulated way as described in the paper
* How to start a demo of full-duplex voice dialogue system

Due to the requirements of paper review, we deleted most of the comments in the code. Due to the short time, the code has only been preliminarily sorted out. 
Later, we will refactor the code and open source it on GitHub.



# Preparation

## Images:

Due to the anonymity requirement of the paper, we cannot provide the basic image, and the basic image will be provided after the subsequent code is open-source.

## ASR Service

For non-streaming ASR service, you can refer to [here](https://github.com/morioka/tiny-openai-whisper-api) to deploy openai whisper API service.

For streaming ASR service, we used internal services to deploy a model of the transformer architecture. The deployment script cannot be exposed temporarily.

## LLM Service

We provide a simple LLM chat service script that can reuse kV-cache to speed up. The startup example is as following

```
CUDA_VISIBLE_DEVICES=0 nohup python service/llm_api.py --host 0.0.0.0 --port 8080 --model-path /path/to/model/ --device cuda >> ~/llm.log 2>&1 &
```

## TTS service

For non-streaming TTS service, you can use [Coqui-TTS](https://github.com/coqui-ai/TTS) to build a api service. The command is just like this:

```
CUDA_VISIBLE_DEVICES=4 nohup python -m TTS.server.server --model_name tts_models/en/ljspeech/vits --port 8981 --use_cuda True >> ~/tts.log 2>&1 &
```

For streaming TTS service, we build a service base on [RealtimeTTS](https://github.com/KoljaB/RealtimeTTS). We provide a simple script as following:

```
SERVICE_PORT=8981 CUDA_VISIBLE_DEVICES=4 nohup python service/streaming_tts_api.py >> ~/tts.log 2>&1 &
```

# Run Benchmark

## run baseline(ns-asr + llm + ns-tts) benchmark get record
```
python -m chatbot.benchmark.benchmark_baseline \
    --llm_addr LLM_ADDR \
    --llm_model_name LLM_MODEL_NAME \
    --llm_max_token 256 \
    --llm_type baseline \
    --asr_addr ASR_ADDR \
    --asr_model_name ASR_MODEL_NAME \
    --tts_addr TTS_ADDR \
    --benchmark_sample_path SAMPLE_PATH \
    --audio_data_dir PREPARED_AUDIO_PATH \
    --record_result_path RESULT_RECORD_PATH
```
analyze baseline benchmark record:
```
python -m chatbot.benchmark.analyze \
    --record-path RESULT_RECORD_PATH \
    --llm-type baseline
```

## run (ns-asr + fd-llm + ns-tts) benchmark get record
```
python -m chatbot.benchmark.benchmark_baseline \
    --llm_addr LLM_ADDR \
    --llm_model_name LLM_MODEL_NAME \
    --llm_max_token 256 \
    --llm_type llm-fd \
    --asr_addr ASR_ADDR \
    --asr_model_name ASR_MODEL_NAME \
    --tts_addr TTS_ADDR \
    --benchmark_sample_path SAMPLE_PATH \
    --audio_data_dir PREPARED_AUDIO_PATH \
    --record_result_path RESULT_RECORD_PATH
```
analyze benchmark record:
```
python -m chatbot.benchmark.analyze \
    --record-path RESULT_RECORD_PATH \
    --llm-type llm-fd
```

## run (s-asr + fd-llm + ns-tts) benchmark get record
```
python -m chatbot.benchmark.benchmark_baseline \
    --llm_addr LLM_ADDR \
    --llm_model_name LLM_MODEL_NAME \
    --llm_max_token 256 \
    --llm_type llm-fd \
    --asr_addr ASR_ADDR \
    --asr_model_name ASR_MODEL_NAME \
    --stream_asr \
    --tts_addr TTS_ADDR \
    --benchmark_sample_path SAMPLE_PATH \
    --audio_data_dir PREPARED_AUDIO_PATH \
    --record_result_path RESULT_RECORD_PATH
```
analyze benchmark record:
```
python -m chatbot.benchmark.analyze \
    --record-path RESULT_RECORD_PATH \
    --llm-type llm-fd
```

## run (s-asr + fd-llm + s-tts) benchmark get record
```
python -m chatbot.benchmark.benchmark_baseline \
    --llm_addr LLM_ADDR \
    --llm_model_name LLM_MODEL_NAME \
    --llm_max_token 256 \
    --llm_type llm-fd \
    --asr_addr ASR_ADDR \
    --asr_model_name ASR_MODEL_NAME \
    --stream_asr \
    --tts_addr TTS_ADDR \
    --stream_tts \
    --benchmark_sample_path SAMPLE_PATH \
    --audio_data_dir PREPARED_AUDIO_PATH \
    --record_result_path RESULT_RECORD_PATH
```
analyze benchmark record:
```
python -m chatbot.benchmark.analyze \
    --record-path RESULT_RECORD_PATH \
    --llm-type llm-fd
```


# Run Demo

## run baseline(ns-asr + llm + ns-tts) demo:

```
python -m chatbot.demo.baseline_demo \
    --llm_addr LLM_ADDR \
    --llm_model_name LLM_MODEL_NAME \
    --llm_max_token 256 \
    --llm_type baseline \
    --asr_addr ASR_ADDR \
    --asr_model_name ASR_MODEL_NAME \
    --tts_addr TTS_ADDR
```

## run full-duplex(s-asr + fs-llm + s-tts) demo:
```
python -m chatbot.demo.baseline_demo \
    --llm_addr LLM_ADDR \
    --llm_model_name LLM_MODEL_NAME \
    --llm_max_token 256 \
    --llm_type baseline \
    --asr_addr ASR_ADDR \
    --asr_model_name ASR_MODEL_NAME \
    --stream_asr \
    --tts_addr TTS_ADDR \
    --stream_tts
```




