# Why Does the Effective Context Length of LLMs Fall Short?
The supplementary materials contain the code and data  of our paper "Why Does the Effective Context Length of LLMs Fall Short?"
### Data
We include the data used in our Needle-in-a-Haystack experiments in the `data` folder where the haystack is Paul Graham Essays.
We do not upload all the data used in this work due to the size limitation and it can be downloaded from their official repo.

* [RULER](https://github.com/hsiehjackson/RULER) 
* [InfiniteBench](https://github.com/OpenBMB/InfiniteBench)

We did not add any preprocessing or postprocessing to the data.

### Environment Setup
```bash
conda create -n STRING python=3.8
conda activate STRING

pip install -r requirements.txt
# optional for flash-attn
# install flash-attn separately
pip install flash-attn --no-build-isolation
```

### Running STRING 
```bash
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from string_monkey_patch import replace_with_string

# replace the infernece code
replace_with_string()
model_path = "meta-llama/Llama-3.1-8B"

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
  model_path,
  torch_dtype=torch.bfloat16,
  trust_remote_code=True,
)
inputs = tokenizer("<A extremely long input here> Do u konw STRING? ", return_tensors="pt")
sample = model.generate(**inputs, max_length=128)
print(tokenizer.decode(sample[0]))
```

### Running our Needle-in-a-Haystack experiments 
We also prepare the code to reproduce the NIAH experiments in the paper:
```bash
python test_niah.py --model_path meta-llama/Llama-3.1-8B  --max_length 131072
```
