# DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models

We provide a Python-based implementation of the paper "DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models" as a part of Supplementary Material for anonymous review at ICLR 2024. Please do not distribute files. 

## Quick start

The following sample Python code will compute the influence function values using the GLUE-QNLI dataset with the LoRA rank 8.

```
python3 launcher.py run --exp_id='config_qnli4' --run-id=0 --runpath='./'
```

## The core python file 

- `dataloader.py` includes the construction of tokenizers and generates noisy datasets.

- `lora_model.py` includes LoRA modules.

- `influence.py` includes influence computation algorithms.

## Note

The implementation for the text completion and text-to-image generation tasks is essentially the same, but `influence_full.py` should be used to compute influence function values for each validation data point. We will make all our implementations clean, replicable, and publicly available once the paper gets accepted. 

