# Runtime-Adaptive Pruning for LLM Inference

This files contains the implementation for the paper "RAP: Runtime-Adaptive Pruning for LLM Inference". The reuslt may be different from paper due to rl training.

## Reproducing Results

To reproduce the results for Llama-2-7b-hf as presented in the paper, simply run the following command:

```bash
bash run.sh
```

This script will:

1. Create and activate a conda environment named "rap" with Python 3.10
2. Install all required dependencies
3. Run the following components in sequence:
   - `agent_training.py`: Train the reinforcement learning agent
   - `agent_policy.py`: Generate pruning policies using the trained agent
   - `agent_pruning.py`: Apply the policies to prune the model
   - `agent_benchmark.py`: Evaluate the pruned model on various benchmarks
