# FileAgent Repository

This repository contains two main components:
- `infer/`: Inference and tool-calling framework based on OpenAI / vLLM APIs
- `verl/`: Reinforcement learning training recipes based on [verl](https://github.com/volcengine/verl)


```bash
pip install -r requirements.txt
```


## Inference

The inference code is located in the `infer/` directory and supports two calling methods:

- **Function Calling (Recommended)**: `infer.py`  
  - Suitable for: Services supporting OpenAI tool-calling protocol (official OpenAI / compatible APIs, etc.)  
  - Tool definitions: `infer/tool_config.yaml`  
  - Model and endpoint configuration: `infer/api_config.yaml`, `infer/models_config.json`  
  - Basic usage:
    ```bash
    cd infer
    python infer.py \
      --input /path/to/extracted_bench-v2.json \
      --output /path/to/output_dir \
      --models_config models_config.json
    ```

- **Text Protocol + vLLM**: `tool_vllm_client_sandbox.py`  
  - Suitable for: vLLM / open-source models with only basic chat interfaces that don't support the `tools` field  
  - System prompt: `infer/simple_system_prompt-qwen.txt`  
  - vLLM address configuration: `infer/vllm_config.json`  
  - Basic usage:
    ```bash
    cd infer
    python tool_vllm_client_sandbox.py \
      -i /path/to/input.json \
      -o /path/to/results.json \
      -m your_model_name
    ```


The input format is JSON / JSONL, where each sample must contain at least one field for inference (e.g., `formatted_question`).  
Inference results will be saved to the specified output path, with each line corresponding to detailed results for one sample (including model output, tool call information, etc.).

### Test Data

- We provide a sample test set: `data/extracted_bench-v2-final.json`  
  - Each sample contains: `task_id`, `level`, `formatted_question`, `prewrites` (list of files to be placed in the sandbox), `answer`, and other fields.  
  - Files referenced in `prewrites[*].fpath` are stored by default in the `data/files/` directory.
- Before use, you need to download all original files from the data release location, extract them, and place them in:
  - JSON file: `data/extracted_bench-v2-final.json`  
  - Corresponding files: `data/files/<filename>`  
  Then point the `--input` parameter to that JSON file in your inference command.

### Inference Configuration Files

- `infer/api_config.yaml`  
  - `defaults`: Common configuration (e.g., unified `azure_endpoint`, `api_version`, default temperature).  
  - `models`: Configuration for each `model_name` including `api_key`, `max_tokens`, whether to enable thinking, etc.  
  - Read in `infer.py` via `load_api_config(model_name)`; the actual `model_name` values to run are controlled by `infer/models_config.json`.
  - Before use, replace the `api_key` and `azure_endpoint` in the file with your own service information.

- `infer/vllm_config.json`  
  - Used to configure access addresses for vLLM / open-source models: `{ "model_name": { "ipv6": "...", "port": 18908 }, ... }`.  
  - `vllm_api.py` and vLLM-based scripts will look up the corresponding backend address from here based on the provided `model_name`.  
  - If you deploy a new vLLM service, simply add a new entry to this file and use `-m that_model_name` on the command line.

## Training

The training code is located in the `verl/` directory, based on the official `verl` framework, combined with sandbox for file interaction capabilities, enabling multi-turn tool-calling reinforcement learning training.

### Training Script Examples

- `verl/run_extracted_bench_qwen3-32b.sh`

These scripts submit a `verl.trainer.main_ppo` task via Ray, loading the following configurations:

- Training configuration: `verl/recipe/fileagent/config/extracted_bench_trainer.yaml`  
- Tool/agent configuration: `verl/recipe/fileagent/config/tool/extracted_bench_tool.yaml`, `verl/recipe/fileagent/config/agent_loop.yaml`  
- Data and dataset definitions: `verl/recipe/fileagent/rl_dataset.py` and related parquet data files

### Modifications Required Before Running

Before use, it is recommended to modify the following according to your environment:

- Model and data paths (at the top of the scripts)  
  - `MODEL_PATH`: Point to your base model checkpoint (e.g., Qwen-32B series)  
  - `DATA_HOME` / `TRAIN_FILES` / `VAL_FILES`: Point to your locally prepared RL training/validation data (parquet format)
- Output and Ray cluster  
  - `SAVE_DIR`: Output directory for training logs and model weights  
  - `RAY_ADDRESS`: Your Ray cluster address (can also be changed to the local `ray start --head` address)
- Runtime environment  
  - `verl/runtime_env.yaml`: Runtime configuration for `ray job submit`. You can delete or replace internal proxies and private service URLs, keeping only the necessary env variables and pip dependencies.

After making these modifications, you can start training on the Ray cluster, for example:

```bash
cd verl
bash run_extracted_bench_qwen3-32b.sh
```

For additional hyperparameters during training (such as batch size, max tokens, adv_estimator, etc.), you can adjust them directly in the scripts and `recipe/fileagent/config/extracted_bench_trainer.yaml`.
