## Experiments Code Repository for "Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers"

To replicate our experiments, run the following steps.

#### 1. Install the library
Install `uqlm` from PyPI:
```bash
pip install uqlm
```
You can find the UQLM repository [here](https://github.com/cvs-health/uqlm).

#### 2. Set up LLM credentials
Running our experiments without code changes requires the following LLMs
- GPT-4o, GPT-4o-mini instances on Azure
- Gemini-2.5-Flash, Gemini-2.5-Flash-Lite instances on VertexAI

API keys will need to be configured accordingly. If you do not have access to these models and would like to run our experiments with different LLMs, replace the `AzureChatOpenAI` and/or `ChatVertexAI` objects with any [LangChain Chat model](https://js.langchain.com/docs/integrations/chat/) of your choice.

#### 3. Run scripts
Scripts are to be run, from `~/experiments` directory, in this order:
- `generate_and_score.py`
- `tune_ensemble_and_evaluate.py`
- `blackbox_nsamples_experiments.py`
