# CCO Router: Continual Learning for API Tool Selection

A research framework for continual learning in large language model tool routing, enabling dynamic adaptation to new APIs and tools without catastrophic forgetting.

## Overview

This project implements and evaluates various routing strategies for LLM-based API tool selection, with a focus on continual learning approaches that can adapt to new tools over time while maintaining performance on previously learned tasks.

## Project Structure

```
CCO/
├── best_configs/           # Best performing configuration files
├── cco/                   # Main source code
│   ├── experiments/        # Training experiment outputs and checkpoints
│   ├── utils/             # Utility modules (configs, data preparation, etc.)
│   ├── main.py            # Training script entry point
│   ├── eval.py            # Evaluation script entry point
│   ├── bayes_hyperparms_search.py  # Hyperparameter optimization
│   └── ...                # Other core modules
├── configurations/        # YAML configuration files
├── data/                  # Raw and processed datasets
│   ├── processed/         # Cleaned and formatted datasets
│   ├── raw/              # Original raw datasets
│   ├── process_data.py   # Data processing script
│   └── ...                # Other data-related modules
├── results/              # Evaluation results and metrics
└── pyproject.toml        # Project dependencies and metadata
```

## Features

- **Continual Learning**: LoRA-based adaptation for new API tools without catastrophic forgetting
- **Multiple Retrievers**: Support for BM25, SentenceTransformer, SPLADE, and FlagEmbedding retrievers
- **Flexible Training**: Configurable training pipelines with YAML configurations
- **Comprehensive Evaluation**: Multi-metric evaluation with accuracy tracking
- **Hyperparameter Optimization**: Bayesian search for optimal hyperparameters
- **Experiment Tracking**: WandB integration for experiment monitoring

## Environment Setup

1. **Install UV Package Manager**
```bash
# Install uv (recommended Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
```

2. **Setup Environment**
```bash
# Create and activate environment
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
uv pip install -e .
```

3. **Configure Environment Variables**
```bash
# Create .env file
cp .env.example .env

# Add your API keys
echo "WANDB_API_KEY=your_wandb_key_here" >> .env
```

## Data Preparation
The datasets are produced by `data/process_data.py`. It reads `data/raw/`, normalizes each dataset via its processor, optionally adds retrieval results (bm25, sentence_transformer, splade, flagembedding), and writes JSONL files to `data/processed/` named `cleaned-{dataset}-{train|val|eval}.json`.

Run from the project root and provide a Hugging Face API token via your `.env` (or env var) because the script uses the HF Hub and sets up `./.hf_cache` for model downloads.

Quick example:

```bash
# from project root (example)
HF_API_TOKEN=your_token_here python data/process_data.py --data mllm
```

Supported values for `--data`: `mllm`, `apibench-hf`, `apibench-tf`, `apibench-th`, `apibench-all`, `olympus-1`, `olympus-2`.

## Quick Start

### Training Models

```bash
# Train on APIBench dataset
cco-train --config configurations/train_config.yaml \
                    --experience_name apibench \
                    --variant_name my_experiment

# Train on MLLM dataset
cco-train --config configurations/train_config.yaml \
                    --experience_name mllm \
                    --variant_name my_mllm_experiment

# Train with specific retriever
cco-train --config configurations/train_config.yaml \
                    --experience_name apibench \
                    --retriever bm25 \
                    --variant_name bm25_experiment
```

### Evaluating Models

```bash
# Evaluate single adapter
cco-eval --config configurations/eval_config.yaml \
                    --experience_name apibench \
                    --lora_adapters my_experiment

# Evaluate with merging strategy
cco-eval --config configurations/eval_config.yaml \
                    --experience_name apibench \
                    --lora_adapters adapter1 adapter2 \
                    --merging_strategy ties \
                    --weights 1.0 1.0 \
                    --density 0.3
```

### Hyperparameter Search

```bash
# Run Bayesian hyperparameter optimization
cco-hyperparam
```

## Configuration

### Training Configuration (`configurations/train_config.yaml`)

Key parameters:
- `experience_name`: Dataset to use ("apibench" or "mllm")
- `repo_id`: Base model identifier (default: "huggyllama/llama-7b")
- `retriever`: Retrieval method ("bm25", "sentence_transformer", "splade", "flagembedding")
- `epochs`: Number of training epochs
- `batch_size`: Training batch size
- `lr`: Learning rate
- `lora_r`, `lora_alpha`: LoRA configuration parameters

### Evaluation Configuration (`configurations/eval_config.yaml`)

Key parameters:
- `lora_adapters`: List of adapter paths to evaluate
- `lora_merging_strategy`: Strategy for merging multiple adapters ("ties", "dare_linear", "arithmetic_mean")
- `eval_batch_size`: Evaluation batch size
- `temperature`: Generation temperature

## Available Retrievers

- **BM25**: Classical sparse retrieval based on term frequency
- **SentenceTransformer**: Dense embedding-based retrieval
- **SPLADE**: Sparse learned retrieval with BERT
- **FlagEmbedding**: State-of-the-art dense retrieval

## Experiments

The framework supports several experimental configurations:

- **APIBench Baseline**: Standard fine-tuning on APIBench dataset
- **MLLM Integration**: Multi-modal language model approaches
- **Continual Learning**: Sequential learning with LoRA adapters
- **Retriever Comparison**: Systematic evaluation of different retrieval methods
- **Hyperparameter Optimization**: Automated search for optimal configurations

## Results and Metrics

Evaluation metrics include:
- **Accuracy**: Exact match accuracy for API name prediction
- **Accuracy Exist**: Percentage of valid API names predicted
- **Accuracy Domain**: Domain-level accuracy for API predictions

Results are saved to `results/` directory with:
- `answers.jsonl`: Detailed prediction results
- `metrics.json`: Aggregated performance metrics

## Advanced Usage

```

### Resume Training
```bash
python cco-train --config configurations/train_config.yaml \
                    --resume_from experiments/checkpoint-500
```

### Memory Optimization
Enable low memory mode in configuration:
```yaml
low_memory_mode: true
activation_checkpointing: true
```




## Citation

If you use this work in your research, please cite:

```bibtex
@article{cco-router-2025,
  title={Continual Learning for API Tool Selection in Large Language Models},
  author={Anonymous Author, ...,},
  journal={arXiv preprint},
  year={2025}
}
```

## License

See LICENSE file for details. 
