# AGDC-PubLayNet
## "Autoregressive Generation of Variable-Length Sequences with Joint Discrete and Continuous Spaces" - for PubLayNet

![assets/teaser](assets/teaser.png)

AGDC is a **unified autoregressive framework that jointly models discrete and continuous values** for high-fidelity sequence generation. Unlike traditional approaches that rely on discretization, AGDC directly models continuous values using denoising score matching while handling discrete values with cross-entropy loss.

This repository contains the implementation for layout generation on the PubLayNet dataset, demonstrating superior performance in high-precision settings.

## Setup

```bash
conda create -n agdc python=3.11 -y
conda activate agdc
pip install -r requirements.txt
```
- Download from [this anonymous link](https://drive.google.com/drive/folders/1VOxnaDUe1YLGXWHWqE9hCzlFMJHwqNA-?usp=sharing), and place it under `data/dataset/publaynet/processed`.
- Tested with CUDA 12.6 on RTX 4090

## `const_layout` setup (required for evaluation)
```bash
git clone https://github.com/ktrk115/const_layout.git
cd const_layout
./download_model.sh
mkdir -p data/dataset
ln -sf $(pwd)/../AGDC/data/dataset/publaynet $(pwd)/data/dataset/publaynet
```

## Training
```bash
# Run from the AGDC/ directory
python main.py \
    --exp {your_exp_name} \
    --loss_weight 100 \
    --diffloss_d 6 \
    --diffloss_w 1024 \
    --n_layer 12 \
    --n_head 16 \
    --n_embd 1024 \
    --lr 7.5e-5 \
    --lr_decay \
    --warmup_iters 1000 \
    --final_iters 50000 \
    --eos_alpha 0.05 \
    --length_loss_weight 0.005
```
Model checkpoints will be saved to the `logs/{exp_name}/ckpt` directory.

## Pre-trained Model
A pre-trained checkpoint for PubLayNet (following the paper's setting) is available [here](https://drive.google.com/drive/folders/1VOxnaDUe1YLGXWHWqE9hCzlFMJHwqNA-?usp=sharing).

## Sampling
```bash
# Run from the AGDC/ directory
python sample_pkl.py \
    --ckpt {path_to_your_pth}.pth \
    --diffloss_d 6 \
    --diffloss_w 1024 \
    --n_layer 12 \
    --n_head 16 \
    --n_embd 1024 \
    --batch_size 256 \
    --out_path {path_to_your_pkl}.pkl \
    --dataset_type "test" \
    --num_context_boxes 2 \
    --num_save 50 \
    --eos_alpha 0.05
```
- `dataset_type`: Sample using "test" or "train" data
- `num_context_boxes`: Number of initial boxes to condition sequence generation
- `num_save`: Number of visualized outputs to save (will be saved in the same directory as the pkl file)


## Evaluation
```bash
# Run from the const_layout/ directory
python eval.py publaynet {path_to_your_pkl}.pkl
```

## Acknowledgements
This code builds on the following open-source projects:
- [mar](https://github.com/LTH14/mar)
- [DeepLayout](https://github.com/kampta/DeepLayout)
- [const_layout](https://github.com/ktrk115/const_layout)

Thanks for the wonderful work!