# ProtoLLM
We provide the code for ProtoLLM and ProtoLLM+weight. Below is an overview of the main files and directories in our codebase.

## Files
- **query_oracle_features.py**: Queries LLMs to obtain oracle features for a dataset, which are crucial for enhancing model accuracy.
- **query_weights.py**: Queries LLMs to extract feature weights from a dataset, significantly influencing the model's decision-making process.
- **run.py**: Executes ProtoLLM
- **run_weighted.py**: Executes ProtoLLM+weight, utilizing the queried feature weights to improve overall model performance.

## Directories
- **data**: Contains the datasets used for training and evaluation, sourced from various repositories to ensure diversity and robustness.
- **gpt_weights**: Stores the queried feature weights generated from LLMs, which refine model predictions.
- **oracle_features**: Houses the queried oracle features for each dataset, guiding the model's learning process.
- **results**: Stores the output and evaluation metrics from model runs for further analysis and comparison.



# files to run
---
### run query_oracle_features.py
```bash
python query_oracle_features.py --data bank --query_times 10
```
---
### run query_weights.py
```bash
python query_oracle_features.py --data bank --query_times 10
```
---
### run protoLLM
```bash
python run.py --data bank --shot 0 --seed 0 --gpt_shot 10 --dist_type euclidean
```
--gpt_shot denotes the oracle feature query times $K$ in paper<br>
--dist_type denotes the distance metric

---
### run protoLLM+weighted
```bash
python run_weighted.py --data bank --shot 0 --seed 0 --gpt_shot 10  --dist_type euclidean
```
--gpt_shot denotes the oracle feature query times $K$ in paper

---


# packages
- numpy=2.01
- scikit-learn=1.4.2
- pandas=2.1.4
- openai=1.25.0
- langchain=0.2.14