# Navigating Tool Dynamics: Active Exploration for Enhancing Adaptation in Large Language Models
This repository contains the code and benchmark (ToolQA-D) of our paper.

# Preparation

## Python Environment
For ToolDyna
```bash
conda create -n tooldyna python=3.11
conda activate tooldyna
pip install -r requirements.txt
```
For API Server
```bash
conda create -n api_server python=3.11
conda activate api_server
pip install -r api_server_requirements.txt
```

## Download Pre-train LLM
Please download the `LLama3-8B` and `Qwen2-7B` in the huggingface.


## Benchmark: ToolQA-D
* The external corpus can be downloaded from ToolQA. After downloading and unzipping, users need to place it under the directory /<YOUR_OWN_PATH>/ToolQA-D/data/external_corpus/. (**We will release our processed `external_cirpus.zip` after the review process.**)
* You can assess the test data in `./test_data`, and training data for MCTS in `./train_data_for_mcts`
* You can change the parameter `api_kernel_version` in `'./Dyna_MCTS/src/arguments.py` for different environment of API usage:
    * `api_kernel_version = 0` for $\mathcal{P}_c$
    * `api_kernel_version = 1` for $\mathcal{P}_{s_{\text{in}}}$
    * `api_kernel_version = 2` for $\mathcal{P}_{s_{\text{OOD}}}$

# Usage

## API Server
```bash
cd ./Dyna_MCTS/src
bash start_gunicorn.sh
```

## MCTS in ToolDyna
* ToolQA-D easy
```bash
cd  ./Dyna_MCTS/easy_runs
bash multi_run.sh # or bash single_run.sh
```

* ToolQA-D hard
```bash
cd  ./Dyna_MCTS/hard_runs
bash multi_run.sh # or bash single_run.sh
```


## Self-improvement in ToolDyna
* Please `git clone` Llama-Factory (you can find it in the github).
* Here is our parameters for training `./self_improvement.sh`
* We also provide our processed training data in `./train_data`
    * `llama3.json` is the training data for LLama3.
    * `qwen2.json` is the training data for Qwen2.


## Evaluate the performance on the test set
You can change `api_kernel_version=0 / 1 / 2` in `base_sample.sh` for $\mathcal{P}_c$ / $\mathcal{P}_{s_{\text{in}}}$ / $\mathcal{P}_{s_{\text{OOD}}}$.

### gpt_series model
```bash
cd ./inference_gpt/runs
bash gpt_easy.sh
bash gpt_hard.sh
```

### our model
```bash
cd ./inference/runs
bash easy.sh
bash hard.sh
```

## Others
* We provide the few-shot examples for ToolQA-D easy and hard in `./Dyna_MCTS/src/few_shots`
* You can modify the prompt in `./Dyna_MCTS/src/prompts.py`
* You can increase your own API version in `./Dyna_MCTS/src/api_vary.py`.
* We will release our trained checkpoint after the review process.


With the above information, we believe you can easily reproduce our work.