# Stitch: Training-Free Position Control in Multimodal Diffusion Transformers

This repository contains the code used in the submission:  
**_Stitch: Training-Free Position Control in Multimodal Diffusion Transformers_**

## Repository Structure

### `stitch/`

This folder contains the code to integrate Stitch with the following models:

- **FLUX**
- **Qwen-Image**
- **SD3.5**

For each model, the following files are included:

1. **`[model]_stitch_main.py`**  
   Generate an image from a given prompt using `[model] + Stitch`.

2. **`[model]_forward_with_bbs.py`**  
   Contains modifications to the model's forward pass.

3. **`[model]_constrain_attention.py`**  
   Defines transformer block constraints and saves Cutout maps.

It also contains the environment requirements in **`requirements.yml`**

---

### `bounding_boxes/`

Contains the bounding boxes used with Stitch on **PosEval** to produce the results reported in the paper.

---

### `poseval_evaluation/`

Includes modifications to the **GenEval** evaluation code, enabling evaluation of **PosEval**.

---

### `poseval_prompts/`

Houses the prompts for **PosEval**, covering:

- All 5 novel categories
- The original **GenEval** Position task (`2 Obj`)

