# Code for HaystackCraft

## Setup

```bash
conda create -n HaystackCraft python=3.10 -y
conda activate HaystackCraft
pip install -r requirements.txt
```

If you have trouble running Qwen2.5-1M models, you may create a separate environment with `requirements_0-7-2.txt`.

If you need to evaluate models from OpenAI, specify your OpenAI API key with

```bash
export OPENAI_API_KEY=...
```

If you need to evaluate Gemini models, specify your Gemini API key with

```bash
export GEMINI_API_KEY=...
```

## Static NIAH with Heterogeneous Retrieval Strategies

To evaluate an open source LLM, deploy a local LLM server with vLLM, use for example

```bash
vllm serve meta-llama/Llama-3.1-8B-Instruct --api-key token-abc123 --gpu-memory-utilization 0.95 --trust-remote-code --port 8000
```

You may need to specify huggingface tokens with `export HUGGING_FACE_HUB_TOKEN=...` for access to certain LLMs.

For LLM inference,

```bash
python infer_static.py --llm MODEL_TO_EVALUATE --port PORT_YOU_USE_ABOVE --retriever RETRIEVER_FOR_HAYSTACK_CONSTRUCTION --context_size TARGET_CONTEXT_SIZE --order HAYSTACK_ORDERING 
```

Additionally specify `--ppr` for graph-based reranking with Personalized PageRank (PPR) in haystack construction.

For evaluation, do for example

```bash
python eval.py --result_dir results/bm25/Llama-3.1-8B-Instruct/8000/descending_order/
```

## Dynamic NIAH

### Retrieval Environment Setup

#### BM25

Install Java 21 with for example

```bash
curl -s "https://get.sdkman.io" | bash
source "/root/.sdkman/bin/sdkman-init.sh"
sdk install java 21.0.3-tem
```

#### qwen3_0.6

Deploy a local embedding server with vLLM.

```bash
vllm serve Qwen/Qwen3-Embedding-0.6B --port QWEN_RETRIEVER_EMB_PORT --api-key token-abc123 --gpu-memory-utilization 0.95 --trust-remote-code --enforce-eager
```

### LLM Inference (Enforced Multi-Round)

```bash
python infer_multi.py --llm MODEL_TO_EVALUATE --port PORT_FOR_LOCAL_LLM --retriever RETRIEVER_FOR_HAYSTACK_CONSTRUCTION --emb_port IF_USE_QWEN_RETRIEVER_ABOVE --context_size TARGET_CONTEXT_SIZE --num_rounds NUM_REASONING_ROUNDS
```

Additionally specify `--ppr` for graph-based reranking with Personalized PageRank (PPR) in haystack construction.

### LLM Inference (Variable-Round)

```bash
python infer_variable.py --llm MODEL_TO_EVALUATE --port PORT_FOR_LOCAL_LLM --retriever RETRIEVER_FOR_HAYSTACK_CONSTRUCTION --emb_port IF_USE_QWEN_RETRIEVER_ABOVE --context_size TARGET_CONTEXT_SIZE --max_rounds MAX_REASONING_ROUNDS
```

Additionally specify `--ppr` for graph-based reranking with Personalized PageRank (PPR) in haystack construction.

### Evaluation

For example

```bash
python eval_100.py --result_dir 2_round_results/qwen3_0.6/gemini-2.5-flash-lite/8000/descending_order
```
