# Agent Context Optimization (ACON): Optimizing Context Compression for Long-horizon LLM Agents

`acon` is a research framework for optimizing the context compression for long-horizon LLM agents, focusing on minimizing redundant memory growth while preserving essential information for decision-making.

It provides standardized pipelines for **environments, agents, context compression, and distillation (compressor and agent)** across multiple realistic benchmarks such as **AppWorld**, **OfficeBench**, and **8-objective QA**.

## Table of Contents
- [🚀 Quickstart](#-quickstart)
- [🛠️ Installation](#️-installation)
- [📚 Repository Structure](#-repository-structure)
- [📊 Benchmarks](#-benchmarks)
  - [AppWorld](#appworld)
  - [8-Objective QA](#8-objective-qa)
  - [OfficeBench](#officebench)


## 🚀 Quickstart (AppWorld)

Install [AppWorld](https://github.com/StonyBrookNLP/appworld) environment (details in [AppWorld README](experiments/appworld/README.md)).
```bash
git lfs install
git clone https://github.com/StonyBrookNLP/appworld
cd appworld
pip install -e .
appworld install --repo
appworld download data
```

Run AppWorld agent with the history compression:

```bash
mv /path/to/appworld/data /path/to/acon/experiments/appworld
cd acon
pip install -e .
# Place the openai API key in `configs/private_config.yaml`.

cd experiments/appworld
python run_all.py \
    --split train \
    --model_name gpt-4.1-mini \
    --tag baseline \
    --co_config_path configs/context_opt/gpt-4.1-mini_history.yaml
```

Results will be saved in:

```
experiments/appworld/outputs/gpt-4.1-mini_baseline/
```

Do you want to optimize the compression guideline and distill to a local model? See below!


## 🛠️ Installation

### Prerequisites
- Python 3.11+

### Basic Installation

```bash
pip install -e .
```

### Configuration

Place your OpenAI API key in `configs/private_config.yaml`:

```yaml
openai_key: "your_api_key_here"
```

## 📚 Repository Structure

```text
acon/
├── configs/                # API config
├── experiments/            # benchmark runners (AppWorld, OfficeBench, QA) & utils for fine-tuning and prompt optimization
├── src/productive_agents/  # implementations for environments, agents, and context compressors
└── README.md
```

## 📊 Benchmarks

We currently support three benchmark families:

| Benchmark | Description | Folder |
|------------|--------------|---------|
| **AppWorld** | Day-to-day personal task workflows | [`experiments/appworld`](experiments/appworld) |
| **OfficeBench** | Office productivity automation | [`experiments/officebench`](experiments/officebench) |
| **8-objective QA** | Lightweight reasoning & retrieval tasks | [`experiments/smolagents`](experiments/smolagents) |

All benchmarks follow the same experimental pipeline:
1. Run baseline experiments with GPT models
2. Optimize context compression guidelines
3. Distillation Stage 1 (Compressor LoRA)
4. Distillation Stage 2 (Agent LoRA)

For detailed per-benchmark instructions, please refer to:
- [AppWorld README](experiments/appworld/README.md)
- [8-objective QA README](experiments/smolagents/README.md)
- [OfficeBench README](experiments/officebench/README.md)
