# ToolWeave: Fine-Grained and Controllable Synthetic Data Generation for Multi-Turn Tool Calling with Non-Frontier LLMs

## Overview
This workspace contains prompt templates powering a multi-stage synthetic data pipeline for controllable multi-turn tool-use dialogues. The pipeline proceeds through four stages:
- Tool Generation
- Tool Graph Traversal & Goal Generation
- Plan Partitioning & Subgoal Generation
- Multi-Agent Dialogue Synthesis

Each stage’s prompts are modular so you can plug in different models while preserving structural guarantees (connectivity, parameter grounding, etc.).

## Repository Structure
```
tool_graph_synthesizer/   # Stage 1: Seed + expansion of an API/tool universe
tool_graph_sampler/       # Stage 2: Sampling / traversing tool graphs + goal creation
plan_generator/           # Stage 3: Partitioning tool chains + utterance scaffolds
dialogue_synthesizer/     # Stage 4: Multi-agent role prompts to produce dialogues
README.md
```

## Stage 1: Tool Generation (API Universe Construction)
Directory: [tool_graph_synthesizer](tool_graph_synthesizer)

Prompts:
- [seed_apis_prompt.txt](tool_graph_synthesizer/seed_apis_prompt.txt): Produces foundational "seed" APIs with enforced schema discipline (enums, date formats, required inference).
- [expand_entities_prompt.txt](tool_graph_synthesizer/expand_entities_prompt.txt): Adds new APIs that bridge uncovered entities while guaranteeing connectivity via shared parameters.
- [complexify_prompt.txt](tool_graph_synthesizer/complexify_prompt.txt): Introduces higher-complexity APIs (nested params, mixed types, defaults).
- [connectivity_prompt.txt](tool_graph_synthesizer/connectivity_prompt.txt): Grows multi-step chains by adding realistic downstream actions.
- [pattern_fan_out_prompt.txt](tool_graph_synthesizer/pattern_fan_out_prompt.txt): Generates "fan-out" parallel APIs all consuming a distributor output.
- [api_description_paraphrase_prompt.txt](tool_graph_synthesizer/api_description_paraphrase_prompt.txt): Style-controlled paraphrasing of API descriptions.
- [param_paraphrase_prompt.txt](tool_graph_synthesizer/param_paraphrase_prompt.txt): Refactors parameter names/descriptions.
- [enum_refinement_prompt.txt](tool_graph_synthesizer/enum_refinement_prompt.txt): Synthesizes plausible enum value sets.
- [required_params_prompt.txt](tool_graph_synthesizer/required_params_prompt.txt): Deduces minimal required parameters.
- [default_value_prompt.txt](tool_graph_synthesizer/default_value_prompt.txt): Supplies sensible defaults for optional params.
- [batch_api_connection_prompt.txt](tool_graph_synthesizer/batch_api_connection_prompt.txt): Validates logical output -> input parameter linkage.

Outputs of stage 1: A connected, normalized tool catalog.

## Stage 2: Tool Graph Sampling & Goal Generation
Directory: [tool_graph_sampler](tool_graph_sampler)

Prompts:
- [beam_search_generate_goal_prompt.txt](tool_graph_sampler/beam_search_generate_goal_prompt.txt): Generates abstract high-level goals from a linear tool sequence.
- [beam_search_score_goal_prompt.txt](tool_graph_sampler/beam_search_score_goal_prompt.txt): Scores candidate goals for relevance + coherence (used in beam/pruning loops).
- [generate_fan_out_goal_prompt.txt](tool_graph_sampler/generate_fan_out_goal_prompt.txt): Crafts goals motivating fan-out / fan-in workflows.
- [generate_conditional_goal_prompt.txt](tool_graph_sampler/generate_conditional_goal_prompt.txt): Produces conditional branching goals with explicit variable-based decision logic.

Outputs of stage 2: Curated goal + tool-path pairs (linear, fan-out, conditional variants).

## Stage 3: Plan Partitioning & Subgoal Generation
Directory: [plan_generator](plan_generator)

Prompts:
- [partition_goal_prompt.txt](plan_generator/partition_goal_prompt.txt): Splits a full ordered tool path into logical sub-task group sizes.
- [generate_sub_goals_prompt.txt](plan_generator/generate_sub_goals_prompt.txt): Generates user-style utterances per group (no concrete values).
- [generate_conditional_utterance_prompt.txt](plan_generator/generate_conditional_utterance_prompt.txt): Produces the next utterance once a runtime condition is met.

Outputs of stage 3: Structured execution plans + natural scaffold utterances.

## Stage 4: Multi-Agent Dialogue Synthesis
Directory: [dialogue_synthesizer](dialogue_synthesizer)

Prompts (agent roles):
- [assistant_tool_caller.txt](dialogue_synthesizer/assistant_tool_caller.txt): Extracts + normalizes parameters for a tool invocation from memory/history.
- [tool_agent.txt](dialogue_synthesizer/tool_agent.txt): Simulates tool execution with schema-faithful JSON outputs and decision-variable injection.
- [assistant_tool_response_summarizer.txt](dialogue_synthesizer/assistant_tool_response_summarizer.txt): Summarizes tool outputs conversationally.
- [assistant_clarifier.txt](dialogue_synthesizer/assistant_clarifier.txt): Requests missing required parameters in a single clarifying question.
- [user_clarifier.txt](dialogue_synthesizer/user_clarifier.txt): Simulates user supplying exactly requested parameters.
- [user_utterer.txt](dialogue_synthesizer/user_utterer.txt): Generates user turns that naturally express parameter values per plan step.
- [memory_agent.txt](dialogue_synthesizer/memory_agent.txt): Maintains canonical slot/memory cache.
- [user_utterance_paraphraser.txt](dialogue_synthesizer/user_utterance_paraphraser.txt): Paraphrases user utterances for style diversity.
- [user_clarification_paraphraser.txt](dialogue_synthesizer/user_clarification_paraphraser.txt): Paraphrases user clarifications for style diversity.

Outputs of stage 4: Fully realized multi-turn dialogues with grounded tool calls, realistic parameter/value phrasing, and controlled branching.

## End-to-End Flow Summary
1. Stage 1 builds a richly connected API/tool universe (seed -> expansion -> complexity -> fan-out).
2. Stage 2 samples coherent tool chains and synthesizes abstract goals (linear / conditional / parallel).
3. Stage 3 partitions chains into executable sub-task groups and drafts utterance scaffolds.
4. Stage 4 runs a controlled multi-agent simulation (user, assistant, tool simulator, memory) to produce final dialogue datasets (tool call JSON + natural language).
