<p align="center" width="100%">


# Structured Attention Matters to Multimodal LLMs in Document Understanding

## 🕹️ Usage
### Environment Setup
```bash
conda env create -n structureM python=3.12
source activate structureM
cd structure-matters
bash install.sh
```

### Data Preparation
- Create a data directory:
```bash
mkdir data
cd data
```
- Download the dataset from huggingface and place it in the data directory. You can use symbol link or make a copy

- Return to the project root:
```bash
cd ../
```

- Extract the data using:
```bash
python scripts/extract.py --config-name <dataset>  # (choose from mmlb / ldu / ptab / feta)
```
The extracted texts and images will be saved in ./tmp/<dataset>.



**Note:** For all experiments: `<dataset>` should choose from (mmlb / ldu / ptab / feta), `<run-name>` can be any string to uniquely identify this run (required).

<!-- ### Transfer OCR text into structured text

```bash
python scripts/structure_transform.py --config-name <dataset> run-name=<run-name>  
``` -->


### Run the following command to generate answers with different input
For MMLongBench and LongDocUrl, which have ground truth retrieval results, use the following command to run different experiments.

For all experiments: \<dataset> should choose from mmlb/ldu, \<run-name> can be any string to uniquely identify this run (required).
- Use image as input:
Modify the `input_type` parameter in `config/base.yaml` to set different input formats.  
### Choose from: `structured-input` / `image` / `image-text`


```bash
python scripts/predict.py --config-name <dataset> run-name=<run-name>  
```

### Attention Analysis for Single and Multiple Samples
```bash
python scripts/attention_analysis.py --config-name <dataset> run-name=<run-name>  
```
**Note:** This project provides some question samples from MMLongBench for generating heatmaps.  
These samples are located in `./results/MMLongBench/images_question_for_heat_map.json`.  
Before generating the heatmap, you need to obtain the corresponding structured text of each sample and pass it as an input parameter.


## License

This project is licensed under the terms of the Apache License 2.0.
You are free to use, modify, and distribute this software under the conditions of the license. See the LICENSE file for details.
