# chunk-attention

## Build from source

### GPU Kernel(BLAS=CUDA)

```bash
pip install mypy xformers vllm

# Find your PyTorch installation path by:
python -c 'import torch.utils; print(torch.utils.cmake_prefix_path)'

# Linux
cmake -S . -B build -DTORCH=</path/to/python>/site-packages/torch -DUSE_MKL=OFF -DUSE_CUDA=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build

python tests/test_chunk_attn.py
```

### CPU Kernel(Experimental, BLAS=MKL)

```bash
# Find your PyTorch installation path by:
python -c 'import torch.utils; print(torch.utils.cmake_prefix_path)'

# Linux
# sudo apt install intel-oneapi-mkl-devel
cmake -S . -B build -DTORCH=</path/to/python>/site-packages/torch -DUSE_MKL=ON -DUSE_CUDA=OFF -DMKL_LINK=static -DCMAKE_BUILD_TYPE=Release
cmake --build build

# Windows
cmake -S . -B build -DTORCH=</path/to/python>/site-packages/torch -DUSE_MKL=ON -DUSE_CUDA=OFF  -DMKL_LINK=static
cmake --build build --config ReleaseWithDebInfo

python tests/test_chunk_attn.py
```

MKL CMake options:
* -DMKL_LINK=[static,dynamic]
* -DMKL_THREADING=[sequential,parallel]

## Tips

### Change HuggingFace Cache Path

```bash
export TRANSFORMERS_CACHE=/mnt/huggingface/
```