<div align="center">

<div class="logo">
      <img src="assets/logo.png" style="width:180px">
   </a>
</div>

<h1>DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling</h1>

<div>
    <strong>NeurIPS2025  Submission1112 Authors</strong>
</div>
<div>
    Anonymous Affiliations
</div>
<div>
</div>

<div>
    <strong><em>Diffusion ConvNet is Stronger than you Think!</em></strong>
</div>



![](assets/teaser.png)


</div>

<be>


## 🔥 News
- **2025.5.18**: We release code of DiCo. Model checkpoints will be available after the double-blind review.

![](assets/fig1.png)


## 🎰 Training
#### I - Prepare training data
Similar to [fast-DiT](https://github.com/chuanyangjin/fast-DiT), we use VAE to extract ImageNet features before starting training:
```shell
torchrun --nnodes=1 --nproc_per_node=1 --master_port=1234 extract_features.py \
    --model DiT-XL/2 \
    --data-path /path/to/imagenet/train \
    --features-path /path/to/store/features \
```
#### II - Training for DiCo
To launch DiCo-XL (256x256) training with `8` GPUs on one node:
```shell
export WANDB_API_KEY='YOUR_WANDB_API_KEY'
accelerate launch \
    --multi_gpu \
    --num_processes=8 \
    --main_process_port=1234 \
    --mixed_precision=no \
    train_accelerate.py \
    --feature-path=/path/to/store/features \
    --image-size=256 \
    --model-domain=dico \
    --model=DiCo-XL\
    --results-dir=/path/to/store/exp/results \
    --exp-name=DiCo-XL-256
```
To launch DiCo-XL (256x256) training with `32` GPUs on four nodes:
```shell
export WANDB_API_KEY='YOUR_WANDB_API_KEY'
accelerate launch \
    --multi_gpu \
    --num_processes=32 \
    --num_machines=4 \
    --main_process_ip=... \
    --main_process_port=1234 \
    --machine_rank=... \
    --mixed_precision=no \
    train_accelerate.py \
    --feature-path=/path/to/store/features \
    --image-size=256 \
    --model-domain=dico \
    --model=DiCo-XL \
    --results-dir=/path/to/store/exp/results \
    --exp-name=DiCo-XL-256
```

## ⚡ Evaluation (FID, Inception Score, etc.)
For example, to sample 50K images from our pre-trained DiCo-XL model over `8` GPUs, run:

```shell
torchrun --nnodes=1 --nproc_per_node=8 --master-port=1234 \
    sample_ddp.py \
    --ckpt=/path/to/ckpt.pt \
    --image-size=256 \
    --model=DiCo-XL \
    --model-domain=dico \
    --cfg-scale=1.0
```
This script generates a folder of samples as well as a `.npz` file which can be directly used with [ADM's TensorFlow evaluation suite](https://github.com/openai/guided-diffusion/tree/main/evaluations) to compute FID, Inception Score and other metrics.