# README for FlashMask ICLR2025 Submission

## Directory Structure

The submitted zip package contains the following files:

```
.
├── apply_mask.h                # Core kernel for apply_mask
├── precompute_min_max.h        # Core kernel for precomputing the min and max row index in Algorithm 1 (line 4)
├── fwd_kernel.cu               # Forward core kernel for Algorithm 1 in the main paper
├── bwd_kernel.cu               # Backward core kernel for Algorithm 2 in Appendix A.1
├── benchmark.py                # Python demo for generating kernel benchmark performance
├── kernel_test_seq_info.txt    # Test data in Appendix A.5.2
└── README.md
```

## Code Usage Instructions

Our code is developed as FlashMask, based on the PaddlePaddle deep learning framework and built upon FlashAttention-2. Due to the extensive engineering codebase, we have provided the core kernel implementations that align with the algorithm descriptions in the paper. This should help the reviewer quickly correlate the code with the Algorithms described in the paper. Additionally, we provide executable Python code to reproduce the data presented in Figure 5 of Section 5.4 in the main text.

## How to Use

### Environment Requirements

- Python 3.10
- CUDA >= 12.0
- NVIDIA A100-SXM 80G GPU

### Installation

Install the latest PaddlePaddle environment. Follow the official PaddlePaddle installation method to install the nightly package. The installation guide is available at: 

[https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html)

To install, run the following command:

```bash
python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu123/
```

### Execution

Run the following command to execute the benchmark:

```bash
python benchmark.py
```

After execution, the benchmark results will be displayed on the screen and also saved as a CSV file.
