# NaturalProver

### Data
- See instructions in `gpt3ft` directory for preprocessing.

- Due to filesizes, we provide the raw `proofwiki.json` and code for preprocessing; all preprocessed datasets will be publicly available upon acceptance to the conference (and the authors can provide them during the review process if needed.)

- Theorem indices for the core evaluation set are found in `reduced_ixs.txt`.

- Additional data-related code is in `npgen/format_natproofs.py`.

#### Retrieved references

- Code for retrieving the references from the `naturalproofs` pretrained joint retriever is found in `npgen/get_retrievals.py`.



### GPT-3 training and generation
See `gpt3ft` directory.

We provide this code since it contains the key methods used in our paper. 
Additional code including the GPT-2 / GPT-J models will be released
upon acceptance to the conference (and the authors can provide the additional code during the review process as needed).

### Metrics
Evaluation metrics are found in `npgen/evaluation_proofgen.py`.
See `notebooks/evaluation_proofgen.ipynb` for an example of how to run the metrics on generations.


