# Our Project Page: https://mild-multi-layer-diffusion.github.io/
# Our Dataset: https://mild-multi-layer-diffusion.github.io/dataset/


## 📋 Supplementary Material Limitations

Due to the size limitations of supplementary materials, we are unable to provide the complete model weights and carefully curated dataset in this submission. However, we commit to open-sourcing the full model weights, training data, and additional resources upon acceptance. For more information and updates, please visit our project homepage.




## 📋 Project Overview

This project implements an innovative dual-branch LoRA architecture for image inpainting tasks. Key features include:

- **Dual LoRA Architecture**: Separate processing for foreground and background regions to improve restoration quality
- **Layer-Aware Mechanism**: Different LoRA strategies for different network layers
- **Spatially-Modulated Attention (SMA)**: Enhanced perception of mask regions
- **Progressive Training**: Multi-stage training strategy for optimized convergence

## 🏗️ Project Structure

```
submission/
├── code/
│   ├── README.md                           # This file
│   ├── train_unet.py                       # Training script
│   ├── inference.py                        # Inference script
│   ├── dual_lora_inpaint_pipeline.py       # Dual LoRA inpainting pipeline
│   ├── config.py                           # Configuration file
│   └── layer_aware_dual_lora.py            # Layer-aware dual LoRA model
├── models/
│   ├── lora_weights.pth                    # Trained LoRA weights
│   └── lora_weights_alpha.pth              # Alpha weights
└── sample_dataset/                         # Sample dataset
```

## 🚀 Quick Start

### Requirements

- Python 3.8+
- PyTorch 2.0+
- CUDA 11.8+ (for GPU acceleration)
- At least 16GB GPU memory

### Installation

```bash
# Clone the repository
git clone <repository-url>
cd submission

# Install dependencies
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install diffusers transformers accelerate
pip install opencv-python pillow numpy tqdm
```

### Inference

#### Single Image Inference

```bash
python code/inference.py \
    --image path/to/input/image.jpg \
    --mask path/to/mask/image.png \
    --output path/to/output/image.png \
    --checkpoint models/
```

#### Parameter Description

- `--image`: Input image path
- `--mask`: Mask image path (white regions indicate areas to be inpainted)
- `--output`: Output image save path
- `--checkpoint`: Model weights directory path

### Training

#### Dataset Preparation

The dataset should contain the following files:
- `meta.json`: Contains image paths, mask paths, and other metadata
- Image files: Original images and corresponding mask images

#### Start Training

```bash
python code/train_unet.py
```

Training parameters can be configured in `config.py`, including:

- `pretrained_model_name_or_path`: Pre-trained model path
- `lora_rank`: LoRA rank
- `learning_rate`: Learning rate
- `train_batch_size`: Batch size
- `num_train_epochs`: Number of training epochs



## 🐛 Troubleshooting

### Common Issues

1. **CUDA Memory Insufficient**
   - Reduce `train_batch_size`
   - Enable gradient checkpointing `gradient_checkpointing = True`

2. **Model Loading Failed**
   - Check model weights file path
   - Verify PyTorch version compatibility

3. **Training Not Converging**
   - Adjust learning rate
   - Check data quality
   - Increase number of training epochs


