# Artifact: Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions

This repository provides code for evaluating models on the benchmark, and the code to reproduce
EditPackFT and EditCoder, a dataset and a LLM built for instructional code editing.

## Dependencies

The Python dependencies required to run the code in this repository are listed in `requirements.txt`.
Furthermore, two additional repositories are required to run the code in this repository:

- [MultiPL-E](https://github.com/nuprl/MultiPL-E)
- [finetuning-harness](https://github.com/cassanof/finetuning-harness)

These should be installed as such (running in the root of this repository):

```bash
git clone https://github.com/nuprl/MultiPL-E
pushd ./editcoder
git clone https://github.com/cassanof/finetuning-harness
popd
```

## Structure

- `./benchmark` contains the CanItEdit benchmark dataset and code for generating and evaluating completions
- `./editcoder` contains code to train an EditCoder model
- `./editpackft` contains code to reproduce the EditPackFT dataset
