This is anonymized code for the paper "Measuring AI Ability to Complete Long Software Tasks".

Note this version of the code adds SWAA data for o3 and o4-mini, which had been omitted in the main paper.

# Usage

```pip install -r requirements.txt```

You can immediately run the example analysis notebook at `example_analysis.ipynb`; this will generate some analysis similar to Figure 1.

It is also possible to generate other plots with `dvc repro`, but you may have to run `dvc init --no-scm` first. If this fails for some reason you can find commands in the `dvc.yaml` and run them manually.