# Installation

```
mamba create --name lra python=3.10 -y
mamba activate lra

pip install llm-foundry==0.18.0
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt
pip install dataset==3.6.0
pip install fire
pip install sentence-transformers

git clone https://github.com/MadryLab/trak.git && cd trak/fast_jl && pip install . && cd .. && pip install . && cd ..
pip install git+https://github.com/IST-DASLab/influence_distillation.git
pip install git+https://github.com/Dao-AILab/fast-hadamard-transform.git
pip install flash-attn --no-build-isolation

pip install \
    --extra-index-url=https://pypi.nvidia.com \
    "cudf-cu12==25.8.*" "dask-cudf-cu12==25.8.*" "cuml-cu12==25.8.*" \
    "cugraph-cu12==25.8.*" "nx-cugraph-cu12==25.8.*" "cuxfilter-cu12==25.8.*" \
    "cucim-cu12==25.8.*" "pylibraft-cu12==25.8.*" "raft-dask-cu12==25.8.*" \
    "cuvs-cu12==25.8.*" "nx-cugraph-cu12==25.8.*"
```


# Preprocessing

### Embedding

```
CUDA_VISIBLE_DEVICES=0 python ss_embd.py --dset gsm8k --embd_type bert
```

### Loss Computation 

```
CUDA_VISIBLE_DEVICES=0 python ss_loss_grad.py --model_path meta-llama/Meta-Llama-3-8B-Instruct --dset gsm8k --skip_grad
```


# Fine-tuning
You will need a single GPU with 80GB of memory. The script will select, train, and evaluate, and finally report the evaluation accuracy.

```
# uniform sampling
CUDA_VISIBLE_DEVICES=0 python ss_select.py --dset gsm8k --method uniform --size 2000 --seed 42

# cluster-based sensitivity sampling
CUDA_VISIBLE_DEVICES=0 python ss_select.py --dset gsm8k --method ss_cluster --size 2000 --seed 42

# low-ran sensitivity sampling
CUDA_VISIBLE_DEVICES=0 python ss_select.py --dset gsm8k --method ss_lowrank --size 2000 --seed 42

# training on the full dataset 
CUDA_VISIBLE_DEVICES=0 python ss_select.py --dset gsm8k --method full
```