# Project Overview

This project provides a framework for conducting diversity experiments using various datasets and models. Follow the instructions below to correctly set up your environment and run the experiments.

## Environment Installation

We used two conda virtual environments: `dataicl` and `vllm`.  
The required libraries for the `dataicl` and `vllm` environments are listed in `requirements_dataicl.txt` and `requirements_vllm.txt`, respectively.

Except for the Classification task, which requires the `dataicl` environment, all other tasks can use the `vllm` environment.




## Model Import

To use the models properly, you need to modify the path to the model. For example, in the Classification module, you need to modify the following path:
```python
model_path = join("/home/amax/exp/huggingface/transformers/",model_name)
```

Or modify the model_path here:

```python
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    return_dict=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    )
```

## Embedding Calculation

Before running any diversity experiments, you need to calculate the embeddings for all datasets. Run the script `emb/embedding.sh` to compute the different embeddings. If you need to run tasks related to Classification, you should be in the ./classification directory and run:

```
bash scripts/embedding.sh &
```

Note that you may need to set the `dataset` and `model_names` in `embedding.sh` (if you wish to use embeddings other than all-roberta-large-v1).
Similarly, you need to modify the embedding path here:
```python
local_cache_dir = os.path.expanduser("/home/amax/exp/huggingface/sentence_transformers")
```

After successfully running `embedding.sh`, the corresponding .pt files for test/train embeddings will be generated under `./data/{dataset}`.

## Running Diversity Experiments

Taking classification task as an example, if you need to run experiments related to Classification, you should be in the `./classification` directory and run:
```
bash scripts/run_classification_0.sh
```

Note that you may need to set `test_dataset_names`, `train_dataset_names`, `k`, `model_names` (LM model), `embs` (e.g., all-roberta-large-v1), `exp_num` (number of selected seeds), and `methods` (for detailed methods refer to utils/icl_utils.py) in `run_classification_0.sh`.

After successfully running `run_classification_0.sh`, the results will be saved in the `./classification/results` directory.

## Evaluating Results

To evaluate the results of various methods, you need to run:

```
python analyze_result.py > ans.out &
```

Similarly, you need to set the parameters you want to test in the main() function.

## Data Import

To use these datasets, you need to ensure the existence of  `data` folder in different subdirectories and move the corresponding dataset to the `data` folder. For example, to use the imdb dataset, you need to ensure that the `./classification/imdb folder` exists.
