# DrBench Enterprise Research Benchmark


![drbench_banner.png](docs/drbench_banner.png)


**`DRBench`** is the first of its kind benchmark designed to evaluate deep research agents on complex, open-ended **enterprise deep research tasks**.

It tests an agent’s ability to conduct **multi-hop, insight-driven research** across public and private data sources,just like a real enterprise analyst.


### Quick Start

#### (0) Install Docker

- Docker (https://www.docker.com/get-started/)
```
cd servicees
make local-build
```
this takes around 30 minutes and only has to be done once

#### (1) Install DrBench
```bash
pip install -e .
```

### (2) Run minimal code 

```
python main.py 
```

This loads task DR0001, generates a basic report and saves the results under `results/minimal`

#### (3) Test your DR Agent

```python
from drbench import drbench_enterprise_space, task_loader
from drbench.agents.drbench_agent.drbench_agent import DrBenchAgentDummy
from drbench.score_report import score_report

# Step 0 - Load one task
task = task_loader.get_task_from_id(task_id="DR0001")
print(task.summary())

# Step 1 - Bring your own agent
dr_agent = DrBenchAgentDummy(model="gpt-4o-mini")

# Step 2 with or without apps
if args.no_docker:
  # no apps
  env = None
else:
  # with apps via docker
  env = drbench_enterprise_space.DrBenchEnterpriseSearchSpace(
      task=task.get_path(),
      start_container=True,
  )

# Step 3 Generate report (Bring your Own Agent)
report = dr_agent.generate_report(
    query=task.get_task_config()["dr_question"], env=env
)

# Step 4 Evaluate report
score_dict = score_report(
    predicted_report=report,
    task=task,
    savedir="results/minimal",
)

# Step 5 - Print Scores
print("Insights Recall: ", score_dict["insights_recall"])
```
