## README for ICLR 2026 Submission 8220

**To Dear ICLR reviewers,**

This repository contains the core source codes of AdaptInfer, released to facilitate the reproducibility of our submission.

The file `modelling_sparse_llama.py` presents an inference framework for LLaMA models that incorporates our new sparsification strategy. The file `score.py` provides the implements of our vision token ranking and selecting algorithm. The file `preprocess.py` contains the code for our preliminary experiments on attention shift detection. 

### Usage
To reproduce the main results reported in our manuscript, please follow the instructions:
1. Clone the official repositories of [LLaVa](https://github.com/haotian-liu/LLaVA) and [SparseVLM](https://github.com/Gumpest/SparseVLMs).
2. Replace the `modelling_sparse_llama.py` and `score.py` in their repositories with the versions provided here.
3. Follow the instructions from LLaVA and SparseVLM to prepare the datasets and environments. Please make sure to match the dependencies specified in **Appendix G.2** of our paper.
4. Run AdaptInfer with example commands: 
```bash
   CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mme.sh
```

### Notes
1. This repository only includes the **core components** specific to AdaptInfer.
2. The full codes and results will be released after acceptance.