# HybridFlow

## Introduction

This repository is the official implementation of **HybridFlow: Resource-Adaptive Subtask Routing for Efficient LLM Inference in Edge-Cloud Collaboration**.

# Efficient LLM Collaboration Inference Framework

The High-Efficiency LLM Collaborative Reasoning Framework is an advanced system designed to optimize the collaborative reasoning process across multiple large language models (LLMs). By leveraging automatic task decomposition, dynamic model allocation, and parallel processing, this system efficiently solves complex problems that are difficult for a single model to handle.

## Core Features

- **Automatic Task Decomposition**: Utilizes a mature "Planner" model to decompose complex problems into a directed acyclic graph (DAG) consisting of a series of smaller, manageable subtasks.
- **Dynamic Model Allocation**: Intelligently assigns the most suitable model (e.g., a smaller, faster model or a larger, more powerful model) to each subtask based on its estimated difficulty.
- **Parallel Task Execution**: Employs a multi-threaded scheduler to execute independent subtasks in parallel, significantly reducing the overall processing time.
- **Comprehensive Performance Monitoring**: Track and report detailed performance metrics, including execution time, first token generation time (TTFT), token usage, and estimated cost, providing deep insights into system efficiency.
- **Batch Processing and Evaluation**: Supports batch processing of datasets and system evaluation, enabling robust model and system performance analysis.
- **Dataset Building Mode**: Built-in dataset building functionality that automatically generates high-quality "question-plan" pairs based on questions and reference answers, for fine-tuning the planner model.
- **Theoretical Performance Modeling**: It can generate theoretical performance reports (including Gantt charts) based on task plans and model performance data, which can be used to analyze and optimize task scheduling.
- **Benchmarking Tool:** Provides a standalone single-model test script (`single_model_only.py`) for direct benchmarking against the performance of collaborative frameworks.

## Working Principle

The system operates through a structured, multi-stage process:

1. **Planning (Task Decomposition)**: The user's query is first sent to a **Planner Model**. This model analyzes the problem and generates a detailed execution plan in XML format. This plan outlines a series of steps, each containing specific tasks, estimated difficulty, token budget, and dependencies on other steps.
2. **Scheduling (Task Assignment)**: The system parses the XML plan to construct a Directed Acyclic Graph (DAG). The scheduler then identifies all tasks whose dependencies are satisfied and assigns them for execution.
3. **Execution (Solving Subtasks)**: Each assigned task is sent to an **Executor Model**. The system dynamically selects a small, efficient model for simpler tasks or a large, powerful model for more complex tasks based on the `Difficulty` attribute in the plan.
4. **Parallel Processing:** The scheduler uses a thread pool to execute multiple independent subtasks concurrently. When a task completes, the scheduler checks if any new tasks are unlocked due to satisfied dependencies and adds them to the execution queue.
5. **Aggregation (Generate Final Answer)**: Once all tasks in the plan are completed, the system collects the results of each step and performs a final aggregation to generate the final answer to the original query.
6. **Report:** After the process is completed, the system will generate a detailed report, including performance metrics, cost analysis, task dependency graph, and quantified evaluation scores.

## Installation and Setup

1. **Install dependencies**

It is recommended to create a virtual environment first.

```bash
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
```

Install the required packages from the `requirements.txt` file:

```bash
pip install -r requirements.txt
```

2. **Configure API Keys and Models**

Example of copying a configuration file:
```bash
cp config.example.yaml config.yaml
```
- Edit the `config.yaml` file to set your model and API credentials.
- Place your API key in the file specified by the `*_key_path` variable (for example, in a folder named `usage/`).

## Detailed Explanation of the Configuration File (`config.yaml`)

`config.yaml` is the system's central control file. The following is a detailed description of each part:

```yaml
# Model Configuration
models:
  small_model: "qwen2.5-3b-instruct" 
  large_model: "gpt-4o"
  router_model: "qwen3-1.7b"
  enable_threshold: True
  threshold: 5 # Use a larger model when the difficulty is greater than or equal to this value.
  use_local_router: True # Whether to use the locally deployed planner model
  local_router_model: "saves/Qwen3-1.7B-Thinking/full/train_2025-09-25-23-33-09" # Local model path

# API configuration
api:
  small_key_path: "usage/qwen"
  large_key_path: "usage/bianxie1"
  router_key_path: "usage/local"
  small_api_base_url: "https://dashscope.aliyuncs.com/compatible-mode/v1"
  large_api_base_url: "https://api.bianxie.ai/v1"
  router_api_base_url: "https://dashscope.aliyuncs.com/compatible-mode/v1"
  local_router_base_url: "http://127.0.0.1:8000/v1" # local router API

# System Configuration
system:
  workers: 10 # Number of parallel worker threads

# Query configuration (for single query mode)
query: "Define all possible orientations and placements of the L-shaped tile within the 2x5 rectangle."

# Dataset configuration (used for dataset evaluation or build modes)
dataset:
  enabled: True  # Set to True to enable dataset mode.
  path: "dataset/TestData/MMLU-STEM.json"
  limit: 50 # Optional: Limit the number of issues processed
  
  # Dataset building configuration
  build:
    enabled: False  # Set to True to build the dataset instead of evaluating it.
    use_models_for_execution: False  # If false, only generate the plan and do not execute the subtasks.
    save_thinking: False  # whether the planner's output will save <think> piece in the output

# Evaluation configuration
evaluation:
  enabled: False  # The master switch for evaluation functions
  planner_enabled: False  # Planner evaluation switch
  executor_enabled: False  # Actuator evaluation switch
  model: "deepseek-chat"  # Referee model used for evaluation
  key_path: "usage/deepseek2"
  api_base_url: "https://api.deepseek.com"
```

## Usage: Collaborative Reasoning Framework

The system supports three main operating modes: single query, dataset evaluation, and dataset construction.

### 1. Single query mode

Used to handle a single complex problem.

1. In `config.yaml`, ensure that `dataset: enabled:` is set to `false`.
2. Define your question in the `query:` field.
3. Run the main script:
```bash
python main.py --config config.yaml
```
4. Detailed results will be stored in the `data_reports/single/` directory and organized by model configuration and timestamp.

### 2. Dataset Evaluation Mode

Used to run and evaluate system performance in batches on a specified dataset.

1. In `config.yaml`, enable the `dataset` section and ensure that `build: enabled:` is set to `false`.
    ```yaml
    dataset:
      enabled: true
      path: "dataset/TestData/MMLU-STEM.json"
      limit: 50 # optional
    ```
2. Run the main script:
    ```bash
    python main.py
    ```
3. The evaluation report will be saved in the `data_reports/dataset/` directory. The report `dataset_report.md` will contain **task planning metrics**:
- **Average number of task steps**: The average number of subtasks that the Planner breaks down into for each problem.
- **Average Compression Ratio**: Critical path depth of the task graph / Total number of tasks. **The lower this value, the higher the parallel potential of the tasks.**
- **Average Token Limit per Step**: The average token budget allocated by the Planner for each subtask.

### 3. Dataset Construction Patterns

This mode is used to generate training data for the fine-tuning planner.

1. In `config.yaml`, enable the `dataset` and `build` sections:
    ```yaml
    dataset:
      enabled: true
      path: "path/to/your/source_dataset.json"
      build:
        enabled: true # Activate build mode
    ```
2. Run the main script: `python main.py`
3. The generated training data will be saved in the corresponding timestamp folder in `data_reports/dataset/`.

## Usage: Single Model Benchmarking

To verify the effectiveness of the collaborative framework, the project provides a standalone benchmark script `single_model_only.py` to evaluate the performance of a single model directly solving the problem.

### 1. Single Query Mode (Benchmark Test)

1. Run directly via command-line arguments:
    ```bash
    python single_model_only.py --query "your question" --model "gpt-4o" --timeout 60
    ```

### 2. Dataset Evaluation Mode (Benchmarking)

1. Modify the `dataset` section in `config.yaml` to point to the dataset you want to test.
2. Running via command line allows you to specify the model and timeout:
    ```bash
    python single_model_only.py --dataset "dataset/your_dataset.json" --limit 10 --model "gpt-4o" --timeout 120
    ```
3. Test results will be saved in the `data_reports/single_model_results/` directory, organized by model name and timestamp into molecular folders.

**Command line argument description (`single_model_only.py`):**

- `--query`: The problem to be solved.
- `--dataset`: Path to the dataset file.
- `--model`: Specifies the name of the model to use (this will override the configuration file).
- `--limit`: The maximum number of questions to process in the dataset.
- `--timeout`: The timeout period (in seconds) for model requests.
- `--config`: Path to the configuration file (default is `config.yaml`).

## Auxiliary Tools

### Task Allocation Analysis Script

After running the dataset evaluation mode, you can use the `analyze_model_tasks.py` script to analyze the distribution of tasks between the large and small models.

**How ​​to use:**

```bash
# python analyze_model_tasks.py <dataset_results.json path> [Optional: Difficulty Threshold]
python analyze_model_tasks.py data_reports/dataset/.../your_timestamp/dataset_results.json 5
```

## Evaluation Framework

To ensure system quality, we employ a rigorous evaluation framework that uses a robust LLM as the arbiter to evaluate the **Planner** and **Executor** models. You can enable this feature in the `evaluation` section of `config.yaml`.

## Code Library Structure

| File | Description |
| --- | --- |
| `main.py` | The main entry point for the **coordination framework**. It handles parameter parsing and coordinates the workflow based on the configuration. |
| `single_model_only.py` | A standalone script for **single model benchmarking**, used for performance comparison. |
| `config.yaml` | A central configuration file used for models, API keys, system settings, and dataset paths. |
| `execution.py` | Contains the core logic for parallel task execution, API calls, and planner streaming response processing within the collaborative framework. |
| `dataset_runner.py` | Manages batch processing (evaluation or construction) of datasets within the collaborative framework and generates comprehensive reports. |
| `evaluation.py` | Implements a referee evaluation framework for evaluating the performance of planners and executors. |
The `performance.py` file contains the `PerformanceTracker` class, used to monitor metrics such as token usage, cost, and execution time. |
| `output_performance.py` | Calculates the theoretical performance benchmark based on model-specific latency and throughput data. |
| `task_metrics.py` | Calculates structural metrics of the task dependency graph (DAG), such as depth and compression ratio. |
| `config.py` | Defines the `ModelConfig` class for programmatically managing model configuration and API clients. |
| `api_pricing.py` | A utility module that provides the latest pricing information for various LLM APIs. |
| `analyze_model_tasks.py`| A script used to analyze and report on task allocation between small and large models. |
The `utils.py` file contains general helper functions such as report path generation. |
| `log_config.py` | Configures the logger to save detailed runtime information to a file. |
