# Setup

- Prepare datasets.  Each dataset is expected to have train.jsonl, dev.jsonl, and test.jsonl.  For some datasets this means differing from the original format; detailed instructions on how to do this are not yet available (TBA).
    - HellaSWAG and CSQA can be used in their original format and only need to be split into the appropriate train/dev/test files.
    - PIQA and aNLI can almost be used in their original format but need the labels from the label file to be merged into the "label" key of each record in the jsonl files.
    - All binary classification tasks can be formatted as `{"text": "...", "label": (0 or 1), "id": "xxx"}` and use the "mr" data reader.
    - Data readers are in `utils_text_classification.py` and `utils_multiple_choice.py`.
- Modify `runconfigs/data_paths.json` as needed.
- Optionally, comment out datasets/methods from `make_paper_configs.py` to run just a subset of experiments

# Running

1. Generate configs by running `python make_paper_configs.py` and note the value for number of configs N printed at the end
2. Run the generated configs with (replacing N from above) `python run_experiment.py --experiments_dir ../data/experiments/ --num_trials 10 run_al_mc.py {1..N}`
3. Collect the results with (replacing N) `python summarize_experiments.py --experiments_dir ../data/experiments/ experiment_v{1..N} > records.jsonl`
4. Print in table form with `python make_tables.py --experiments_dir ../data/experiments/ records.jsonl`

Note: Running all the experiments can take an extremely long time, even on a multi-gpu server.  You might want to reduce the number of trials above or specify just a subset of datasets/models to run in `make_paper_configs.py`.

