# ProCa

 **Prototypical calibration(ProCa)** can adaptively learn a more robust decision boundary for zero- and few-shot text classification instead of greedy decoding. This codebase is developed based on [contextual calibration](https://github.com/tonyzhaozh/few-shot-learning) and thanks for their excellent work. In this repository, we support the zero- and few-shot text-classification evaluation for orignal GPT, contextual calibration and **ProCa** based on GPT-2, GPT-neo, GPT-J, Bloom and any other language model available in [HuggingFace Transformers](https://huggingface.co/models).

## Installation
```
conda create -n fewshot python=3.7
conda activate fewshot
pip install -r requirements.txt
```

## Datasets
We support the evaluation for SST-2, SST-5, AGNews, MR, AP, Subj, DBPedia, RTE and TREC. You can add other text-classification datasets and define the prompt format and label space just similar to existing datasets in `data_utils.py`. However, the label you defined must be encoded as one token by tokenizer.

## Evaluation
You can replicate the results in our paper by running the following script.
```
python run_classification.py \
--model=MODEL \
--dataset=DATASET \
--num_seeds=5 \
--start_seed=0 \
--all_shots="0,1,4,8" \
--bs=4 \
--gmm_train_estimate_scale=SIZE \
--method=METHOD 
```
Specifically, MODEL should be replaced with one of \["gpt2-large", "gpt2-xl", "gptneo", "gptj", "bloom"\], METHOD should be replaced with one of \["ori", "calibrate", "gmm_train_estimate"\] and DATASET should be replaced with one of \["sst2", "sst5", "mr", "subj", "amazon_polarity", "agnews", "dbpedia", "rte", "trec"\]. SIZE should be specified according to the estimate set size declared in our paper.
