# Finetuning Models on LintSeq vs Baseline Code Instruction Data


See `preparing_ft_data.md` for step-by-step instruction to reproduce the processing and generation of our baseline and LintSeq synthetic edit datasets.

We provide bash scripts to reproduce the core experiments whose results are reported in the paper in the folder `finetuning_scripts`. These bash scripts are organized by model.

Code to reproduce our model evaluations on HumanEval and MBPP is provided in `run_eval`. A description of the contents of each of the files in this folder is provided below.

```
-> src/run_eval
-----> humaneval     # HumanEval Evaluations
---------- / data.py           # Data loading and writing utilities 
---------- / run_eval.py       # Entry point for prompting and sampling from trained/finetuned LMs
---------- / run_exec.py       # Entry point for testing LM generated code & computing pass@k
---------- / evaluation.py     # Functional evaluation utilities
---------- / execution.py      # Functional executation utilities
-----> mbpp          # MBPP Evaluations
---------- / run_eval.py       # Entry point for prompting and sampling from trained/finetuned LMs
---------- / run_exec.py       # Entry point for testing LM generated code & computing pass@k
---------- / evaluation.py     # Functional evaluation utilities
---------- / execution.py      # Functional executation utilities
```