# ShufflEval

Supplementary material to accompany the anonymous ICLR submission.

The experimental design relies on caching all LM queries. This enables resuming and rerunning the 
experiments quickly, and makes it convenient to do most of the experiments in notebooks. 

# Reproducing

To reproduce the Wikipedia experiment, you just have to run:
`python get_wikipedia.py` and then `wiki_low_resource.ipynb`
This will create a `plots/` subdirectory


To reproduce the conlang experiment, you just have to run
`conlang.ipynb`


Scrape Wikipedia
* `get_wikipedia.py` just run `python get_wikipedia.py` to scrape the data. Automatically determines which wikipedias have enough data.


Helper files imported by notebooks and get_wikipedia.py
* `metrics.py` contains the reference-based eval metric
* `language_model.py` contains the abstract LM class which caches calls
* `openai_lm.py` contains the OpenAILM(LM) subclass
* `utils.py` basic utilities like loading and saving .json etc

# Data for conlangs

* We provide the entire conlangs, texts, and translations that were generated in `conlangs/*.json` one for each conlang
* It's easy to browse this data, just open each `conlangs/*.json` file
* Each conlang .json file includes:    
    * Language name
    * Species name
    * Unique property 
    * Conculture
    * Conlang1 -- this defines the conlang
    * parallel_texts: the 10 source texts along with target translations, each split into sentences
    * Conlang2 -- any necessary additional vocab to make sure it is possible to translate these
    * scores and translations of the 13 LM-based translators for each of the ten texts

# Data for Wikipedia

* In data/wiki_source, we provide the 10 source texts for each language.
* Each file is a .json file with 10 articles in both the source and English versions.