# Fourier Token Merging: Understanding and Capitalizing Frequency Domain for Efficient Image Generation

This paper introduces *Fourier Token Merging*, a new method for understanding and capitalizing frequency domain for efficient image generation. By introducing frequency token merging, we find that transforming the token into the frequency domain representation for clustering can better exert the ability of clustering based on the underlying redundancy after de-correlation. Through analytical and empirical studies, we demonstrate the benefits of using Fourier clustering over the original time domain clustering. We experimented fourier token merging on the stable diffusion model, and the results show up to 25\% reduction in latency without impairing image quality.

The code used for empirical experiments are available in this repository for validation and replication.

## Pre-requisites
This project was primarily developed with CUDA GPU. No CUDA compilation is required.  We have tested it on RTX 4090.


## Setup

```sh
conda create -n llm python=3.9
```

As always, activate the environment and install dependencies:
```sh
conda activate llm
pip3 install -r requirements.txt
```

## Datasets 
These are the datasets and domains used. We use ImageNet evaluation dataset. First download the ImageNet-eval dataset from ILSVRC2012 directory and store it in the `data` directory. Then run `sample_5000.py` to select 5 images in each class and store them in a new separate directory. Place the resized datasets in `data/imagenet_val_1000flat_resized` using `resize.py`.


## Models
Models that we experimented are Stable Diffusion (SD) (CC-BY 4.0) models v1.5.

## Obtaining Results
The Fourier Token Merging is located in the `src` directory. The files `run_experiments_{*}.sh` are used for evaluating the methods and obatining the results. It involves generating two images with Fourier Token Merging for each class and store them in the EXP directory. It will output the generation latency and call apis to compute the similarity between the generated images and baseline obtained in the Datasets section.


## 
**Copyright**:  The copyright of this repository belongs to the authors of the NeurIPS'2025 paper submission (#24047). The purpose of this package is only for the assessment by the NeurIPS'2025 program committee during the paper review process; any other uses for any other purposes are prohibited.